In Tomcat 4 you can configure a filter. <filter> <filter-name>SetCharacterEncodingFilter</filter-name> <filter-class>filters.SetCharacterEncodingFilter</filter-class> <init-param> <param-name>encoding</param-name> <param-value>UTF-8</param-value> </init-param> </filter> see "webapps\examples\WEB-INF\classes\filters\SetCharacterEncodingFilter.java" If your Form Header looks like this. <head> <META http-equiv="Content-Type" content="text/html; charset=UTF-8"> </head> and you submit this Form, IE and Netscape send this Form UTF-8 encoded but does not set the encoding of the Request to UTF-8. If the encoding in the Request is not set, the Server uses the default,ISO-8859-1, to decode the request. In tomcat you can change the default using the SetCharacterEncodingFilter. But then you have to use UTF-8 on all your forms, you can not mix encodings. Volker |--------+-------------------------> | | Mathias | | | Brökelmann | | | <mathias@mathia| | | s.d2g.com> | | | | | | 28.04.2002 | | | 11:07 | | | Bitte antworten| | | an cocoon-dev | | | | |--------+-------------------------> >----------------------------------------------------------------------------| | | | An: [EMAIL PROTECTED] | | Kopie: (Blindkopie: Volker Schmitt/BASF-AG/BASF) | | Thema: AW: The encoding nightmare with StreamGenerator | >----------------------------------------------------------------------------| Hi, I think the problem is the servlet engine which parses the parameters out of the request. StreamGenerator simply takes the parameters from the request object. Tomcat will use ISO-8859-1 as character encoding if the browser like ie or netscape is not sending the character encoding to the server. Bad thing: it is hard coded in tomcat so you can not configure the default encoding. (see: Tomcat sources org.apache.catalina.connector. RequestBase method getReader()) The only solution which I found is not to send the post as application/x-www-form-urlencoded but as multipart/form-data. The result is that you get the content as binary and not already parsed by the servlet engine. This should also work specially for xml streams because of the <?xml version="1.0" encoding="UTF-8"?> statement to identify the encoding. Anyway, the StreamGenerater seems not to be able to handle multipart/form-data as ContentType. Why? Hope that helps. Mathias Broekelmann > -----Ursprüngliche Nachricht----- > Von: Robert Koberg [mailto:[EMAIL PROTECTED]] > Gesendet: Sonntag, 28. April 2002 00:28 > An: [EMAIL PROTECTED] > Betreff: Re: The encoding nightmare with StreamGenerator > > Hi Stefano. > > Is your xsl:output putting out utf-8 or iso? > > We have the same problem not using cocoon. We use JS to pre-parse for > these kinds of things - trial and error... :( > > best, > -Rob > > > Stefano Mazzocchi wrote: > > >I have a browser that sends a POST request with: > > > > content-type: application/x-www-form-urlencoded > > > >and the hidden field "content" is populated (using client-side > >javascript) with some xml which looks like this > > > > <?xml version="1.0" encoding="UTF-8"?> > > <page> > > <title>Title</title> > > <abstract>è</abstract> > > ... > > </page> > > > >the weird "è" text is the UTF-8 encoded value for [è] (depending on > >your mail client you might not be getting nothing of the above as I > >write it, but that's exactly part of the encoding nightmare that UTF was > >designed to fix... but there is still a long way to go) > > > >Now, I have use StreamGenerator to get this text, have it parsed and > >feed my pipeline. So far so good. > > > >The problem is that stupid StreamGenerator doesn't recognize the > >encoding (because the content-type doesn't have the 'charset:' part > >defined (and IE can't be tweaked to emit that, AFAIK)) so it spits the > >charachers "as they are" (as they were ASCII encoded) (I used the > >LogTransformer to witness this and the same weird 'è' appears in the > >logs with no encoding translating taking place). > > > >It seems that StreamGenerator (or the parser instance it instantiates) > >fails to see that 'è' is not two 8bits chars but one 16bit char. > > > >I'm positive the bug resides on StreamGenerator: in fact, if I tweak the > >javascript to fill the form content with > > > > <?xml version="1.0" encoding="BLAH"?> > > > >the parser doesn't even trigger an error. > > > >I'm going to investigate how to patch this since I need it badly! but if > >you have any suggestions I'm all ears. > > > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, email: [EMAIL PROTECTED] --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, email: [EMAIL PROTECTED] --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, email: [EMAIL PROTECTED]