Hi Stefano. Is your xsl:output putting out utf-8 or iso?
We have the same problem not using cocoon. We use JS to pre-parse for these kinds of things - trial and error... :( best, -Rob Stefano Mazzocchi wrote: >I have a browser that sends a POST request with: > > content-type: application/x-www-form-urlencoded > >and the hidden field "content" is populated (using client-side >javascript) with some xml which looks like this > > <?xml version="1.0" encoding="UTF-8"?> > <page> > <title>Title</title> > <abstract>è</abstract> > ... > </page> > >the weird "è" text is the UTF-8 encoded value for [è] (depending on >your mail client you might not be getting nothing of the above as I >write it, but that's exactly part of the encoding nightmare that UTF was >designed to fix... but there is still a long way to go) > >Now, I have use StreamGenerator to get this text, have it parsed and >feed my pipeline. So far so good. > >The problem is that stupid StreamGenerator doesn't recognize the >encoding (because the content-type doesn't have the 'charset:' part >defined (and IE can't be tweaked to emit that, AFAIK)) so it spits the >charachers "as they are" (as they were ASCII encoded) (I used the >LogTransformer to witness this and the same weird 'è' appears in the >logs with no encoding translating taking place). > >It seems that StreamGenerator (or the parser instance it instantiates) >fails to see that 'è' is not two 8bits chars but one 16bit char. > >I'm positive the bug resides on StreamGenerator: in fact, if I tweak the >javascript to fill the form content with > > <?xml version="1.0" encoding="BLAH"?> > >the parser doesn't even trigger an error. > >I'm going to investigate how to patch this since I need it badly! but if >you have any suggestions I'm all ears. > --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, email: [EMAIL PROTECTED]