Jos Snellings wrote:
> Hi,
> 
> HttpServletRequest looks 'imperfect':
> Cocoon 3, alpha 2.
> A generator accesses the HttpServletRequest in the setup method:
> 
> request = HttpContextHelper.getRequest(parameters);
> text = request.getParameter("tekst");
> 
> The pages, including forms are ecoded in utf-8.
> The String 'text' is strange: the original content (utf-8) is encoded
> once again:
> if the string on the form was one character, say 'é', the string has a
> length of 4 bytes. It is the result of utf-8 encoding the two byte
> character coming from the client. So, a second conversion is happening.
> 
> Now:
> new String(request.getParameter("text").getBytes("ISO-8859-1")) works
> fine.
> 
> Where should this be corrected?

Jos,

in Cocoon 3 there isn't any code that changes the encoding of request
parameters. The plain HttpServletRequest as provided by the servlet
container is used.

IIRC Tomcat uses ISO-8859-1 by default which follows the recommendation
of the Servlet API spec:

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
SRV.4.9 Request data encoding
Currently, many browsers do not send a char encoding qualifier with the
Content-Type header, leaving open the determination of the character
encoding for reading HTTP requests. The default encoding of a request
the container uses to create the request reader and parse POST data must
be “ISO-8859-1” if none has been specified by the client request.
However, in order to indicate to the developer in this case the failure
of the client to send a character encoding, the container returns null
from the getCharacterEncoding method.
If the client hasn’t set character encoding and the request data is
encoded with a different encoding than the default as described above,
breakage can occur. To remedy this situation, a new method
setCharacterEncoding(String enc) has been added to the ServletRequest
interface. Developers can override the character encoding supplied by
the container by calling this method. It must be called prior to parsing
any post data or reading any input from the request. Calling
this method once data has been read will not affect the encoding.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

So as some others suggested, the best option is using one of the
CharecterEncoding servlet filters and not to remedy this situation
somewhere in C3.

-- 
Reinhard Pötz                           Managing Director, {Indoqa} GmbH
                         http://www.indoqa.com/en/people/reinhard.poetz/

Member of the Apache Software Foundation
Apache Cocoon Committer, PMC member                  reinh...@apache.org
________________________________________________________________________

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@cocoon.apache.org
For additional commands, e-mail: users-h...@cocoon.apache.org

Reply via email to