On 16/3/03 23:38, "Vadim Gritsenko" <[EMAIL PROTECTED]> wrote: > >> true. but you can't have chinese text in US-ASCII, right? > > Even if you can not that anybody will be able to read it ;-) > So yes, right.
Unicode specifes (somewhere) that any character non representable by the current charset-encoding should be replaced with a "?" (\u003f) which exists in all representations... >>> But I am not convinced that it's sitemap's responsibility to worry >>> about encoding (from SoC POV). >> >> I restate: >> >> 1) I want a way for serializers to indicate to the pipeline what is >> the encoding they will be using, so that the pipeline can set the >> right HTTP header for it. > > +-0, I'm not sure (yet) on this one... I am almost sure that it should be made all-the-way around: the client can request a specific encoding to the server: See RFC 2616 section 14.2 page 102: the Accept-Charset header. I believe that the TextSerializer should return what the client asked in its request through the "Accept-Charset" header, if this is present. It it isn't, it should default to what has been specified in the pipeline (if we use <map:serialize charset="xxxx"/>) or default to the "cocoon global" configuration... >> 2) also, i want a way to overwrite the sitemap-wide behavior of every >> single serializers, locally, such as >> >> <map:serialize encoding="UTF-8"/> >> >> when the global serializer configurations state they will be using >> something else. > > But this one is Ok with me and, more over, in line with earlier decision: > http://marc.theaimsgroup.com/?l=xml-cocoon-dev&m=101826371615914&w=2 I'd say to use this only if the client didn't request a particular encoding... On another thought... The cache should store unicode characters "as is", not bytes, as those might change for the same request URL depending on the different headers in the request... Pier