In Tomcat 4 you can configure a filter.

<filter>
    <filter-name>SetCharacterEncodingFilter</filter-name>
    <filter-class>filters.SetCharacterEncodingFilter</filter-class>
    <init-param>
      <param-name>encoding</param-name>
      <param-value>UTF-8</param-value>
    </init-param>
</filter>

see "webapps\examples\WEB-INF\classes\filters\SetCharacterEncodingFilter.java"

If your Form Header looks like this.

<head>
   <META http-equiv="Content-Type" content="text/html; charset=UTF-8">
</head>

and you submit this Form, IE and Netscape send this Form UTF-8 encoded but does
not set the encoding of the Request to UTF-8. If the encoding in the Request is
not set, the Server uses the default,ISO-8859-1, to decode the request. In
tomcat you can change the default using the SetCharacterEncodingFilter. But then
you have to use UTF-8 on all your forms, you can not mix encodings.

Volker



|--------+------------------------->
|        |          Mathias        |
|        |          Brökelmann     |
|        |          <mathias@mathia|
|        |          s.d2g.com>     |
|        |                         |
|        |          28.04.2002     |
|        |          11:07          |
|        |          Bitte antworten|
|        |          an cocoon-dev  |
|        |                         |
|--------+------------------------->
  >----------------------------------------------------------------------------|
  |                                                                            |
  |       An:     [EMAIL PROTECTED]                                    |
  |       Kopie:  (Blindkopie: Volker Schmitt/BASF-AG/BASF)                    |
  |       Thema:  AW: The encoding nightmare with StreamGenerator              |
  >----------------------------------------------------------------------------|





Hi,

I think the problem is the servlet engine which parses the parameters
out of the request. StreamGenerator simply takes the parameters from the
request object.

Tomcat will use ISO-8859-1 as character encoding if the browser like ie
or netscape is not sending the character encoding to the server.
Bad thing: it is hard coded in tomcat so you can not configure the
default encoding. (see: Tomcat sources org.apache.catalina.connector.
RequestBase method getReader())

The only solution which I found is not to send the post as
application/x-www-form-urlencoded but as multipart/form-data.

The result is that you get the content as binary and not already parsed
by the servlet engine. This should also work specially for xml streams
because of the <?xml version="1.0" encoding="UTF-8"?> statement to
identify the encoding.

Anyway, the StreamGenerater seems not to be able to handle
multipart/form-data as ContentType. Why?

Hope that helps.

Mathias Broekelmann

> -----Ursprüngliche Nachricht-----
> Von: Robert Koberg [mailto:[EMAIL PROTECTED]]
> Gesendet: Sonntag, 28. April 2002 00:28
> An: [EMAIL PROTECTED]
> Betreff: Re: The encoding nightmare with StreamGenerator
>
> Hi Stefano.
>
> Is your xsl:output putting out utf-8 or iso?
>
> We have the same problem not using cocoon. We use JS to pre-parse for
> these kinds of things - trial and error... :(
>
> best,
> -Rob
>
>
> Stefano Mazzocchi wrote:
>
> >I have a browser that sends a POST request with:
> >
> >  content-type: application/x-www-form-urlencoded
> >
> >and the hidden field "content" is populated (using client-side
> >javascript) with some xml which looks like this
> >
> >   <?xml version="1.0" encoding="UTF-8"?>
> >   <page>
> >    <title>Title</title>
> >    <abstract>è</abstract>
> >    ...
> >   </page>
> >
> >the weird "è" text is the UTF-8 encoded value for [è] (depending on
> >your mail client you might not be getting nothing of the above as I
> >write it, but that's exactly part of the encoding nightmare that UTF
was
> >designed to fix... but there is still a long way to go)
> >
> >Now, I have use StreamGenerator to get this text, have it parsed and
> >feed my pipeline. So far so good.
> >
> >The problem is that stupid StreamGenerator doesn't recognize the
> >encoding (because the content-type doesn't have the 'charset:' part
> >defined (and IE can't be tweaked to emit that, AFAIK)) so it spits
the
> >charachers "as they are" (as they were ASCII encoded) (I used the
> >LogTransformer to witness this and the same weird 'è' appears in the
> >logs with no encoding translating taking place).
> >
> >It seems that StreamGenerator (or the parser instance it
instantiates)
> >fails to see that 'è' is not two 8bits chars but one 16bit char.
> >
> >I'm positive the bug resides on StreamGenerator: in fact, if I tweak
the
> >javascript to fill the form content with
> >
> >   <?xml version="1.0" encoding="BLAH"?>
> >
> >the parser doesn't even trigger an error.
> >
> >I'm going to investigate how to patch this since I need it badly! but
if
> >you have any suggestions I'm all ears.
> >
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, email: [EMAIL PROTECTED]




---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, email: [EMAIL PROTECTED]






---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, email: [EMAIL PROTECTED]

Reply via email to