Hi, I think the problem is the servlet engine which parses the parameters out of the request. StreamGenerator simply takes the parameters from the request object.
Tomcat will use ISO-8859-1 as character encoding if the browser like ie or netscape is not sending the character encoding to the server. Bad thing: it is hard coded in tomcat so you can not configure the default encoding. (see: Tomcat sources org.apache.catalina.connector. RequestBase method getReader()) The only solution which I found is not to send the post as application/x-www-form-urlencoded but as multipart/form-data. The result is that you get the content as binary and not already parsed by the servlet engine. This should also work specially for xml streams because of the <?xml version="1.0" encoding="UTF-8"?> statement to identify the encoding. Anyway, the StreamGenerater seems not to be able to handle multipart/form-data as ContentType. Why? Hope that helps. Mathias Broekelmann > -----Ursprüngliche Nachricht----- > Von: Robert Koberg [mailto:[EMAIL PROTECTED]] > Gesendet: Sonntag, 28. April 2002 00:28 > An: [EMAIL PROTECTED] > Betreff: Re: The encoding nightmare with StreamGenerator > > Hi Stefano. > > Is your xsl:output putting out utf-8 or iso? > > We have the same problem not using cocoon. We use JS to pre-parse for > these kinds of things - trial and error... :( > > best, > -Rob > > > Stefano Mazzocchi wrote: > > >I have a browser that sends a POST request with: > > > > content-type: application/x-www-form-urlencoded > > > >and the hidden field "content" is populated (using client-side > >javascript) with some xml which looks like this > > > > <?xml version="1.0" encoding="UTF-8"?> > > <page> > > <title>Title</title> > > <abstract>è</abstract> > > ... > > </page> > > > >the weird "è" text is the UTF-8 encoded value for [è] (depending on > >your mail client you might not be getting nothing of the above as I > >write it, but that's exactly part of the encoding nightmare that UTF was > >designed to fix... but there is still a long way to go) > > > >Now, I have use StreamGenerator to get this text, have it parsed and > >feed my pipeline. So far so good. > > > >The problem is that stupid StreamGenerator doesn't recognize the > >encoding (because the content-type doesn't have the 'charset:' part > >defined (and IE can't be tweaked to emit that, AFAIK)) so it spits the > >charachers "as they are" (as they were ASCII encoded) (I used the > >LogTransformer to witness this and the same weird 'è' appears in the > >logs with no encoding translating taking place). > > > >It seems that StreamGenerator (or the parser instance it instantiates) > >fails to see that 'è' is not two 8bits chars but one 16bit char. > > > >I'm positive the bug resides on StreamGenerator: in fact, if I tweak the > >javascript to fill the form content with > > > > <?xml version="1.0" encoding="BLAH"?> > > > >the parser doesn't even trigger an error. > > > >I'm going to investigate how to patch this since I need it badly! but if > >you have any suggestions I'm all ears. > > > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, email: [EMAIL PROTECTED] --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, email: [EMAIL PROTECTED]