Hi Stefano.

Is your xsl:output putting out utf-8 or iso?

We have the same problem not using cocoon. We use JS to pre-parse for 
these kinds of things - trial and error... :(

best,
-Rob


Stefano Mazzocchi wrote:

>I have a browser that sends a POST request with:
>
>  content-type: application/x-www-form-urlencoded
>
>and the hidden field "content" is populated (using client-side
>javascript) with some xml which looks like this
>
>   <?xml version="1.0" encoding="UTF-8"?>
>   <page>
>    <title>Title</title>
>    <abstract>è</abstract>
>    ...
>   </page>
>
>the weird "è" text is the UTF-8 encoded value for [è] (depending on
>your mail client you might not be getting nothing of the above as I
>write it, but that's exactly part of the encoding nightmare that UTF was
>designed to fix... but there is still a long way to go)
>
>Now, I have use StreamGenerator to get this text, have it parsed and
>feed my pipeline. So far so good.
>
>The problem is that stupid StreamGenerator doesn't recognize the
>encoding (because the content-type doesn't have the 'charset:' part
>defined (and IE can't be tweaked to emit that, AFAIK)) so it spits the
>charachers "as they are" (as they were ASCII encoded) (I used the
>LogTransformer to witness this and the same weird 'è' appears in the
>logs with no encoding translating taking place).
>
>It seems that StreamGenerator (or the parser instance it instantiates)
>fails to see that 'è' is not two 8bits chars but one 16bit char.
>
>I'm positive the bug resides on StreamGenerator: in fact, if I tweak the
>javascript to fill the form content with 
>
>   <?xml version="1.0" encoding="BLAH"?>
>
>the parser doesn't even trigger an error.
>
>I'm going to investigate how to patch this since I need it badly! but if
>you have any suggestions I'm all ears.
>




---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, email: [EMAIL PROTECTED]

Reply via email to