I have a browser that sends a POST request with: content-type: application/x-www-form-urlencoded
and the hidden field "content" is populated (using client-side javascript) with some xml which looks like this <?xml version="1.0" encoding="UTF-8"?> <page> <title>Title</title> <abstract>è</abstract> ... </page> the weird "è" text is the UTF-8 encoded value for [è] (depending on your mail client you might not be getting nothing of the above as I write it, but that's exactly part of the encoding nightmare that UTF was designed to fix... but there is still a long way to go) Now, I have use StreamGenerator to get this text, have it parsed and feed my pipeline. So far so good. The problem is that stupid StreamGenerator doesn't recognize the encoding (because the content-type doesn't have the 'charset:' part defined (and IE can't be tweaked to emit that, AFAIK)) so it spits the charachers "as they are" (as they were ASCII encoded) (I used the LogTransformer to witness this and the same weird 'è' appears in the logs with no encoding translating taking place). It seems that StreamGenerator (or the parser instance it instantiates) fails to see that 'è' is not two 8bits chars but one 16bit char. I'm positive the bug resides on StreamGenerator: in fact, if I tweak the javascript to fill the form content with <?xml version="1.0" encoding="BLAH"?> the parser doesn't even trigger an error. I'm going to investigate how to patch this since I need it badly! but if you have any suggestions I'm all ears. -- Stefano Mazzocchi One must still have chaos in oneself to be able to give birth to a dancing star. <[EMAIL PROTECTED]> Friedrich Nietzsche -------------------------------------------------------------------- --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, email: [EMAIL PROTECTED]