Jos Snellings wrote: > Hi, > > HttpServletRequest looks 'imperfect': > Cocoon 3, alpha 2. > A generator accesses the HttpServletRequest in the setup method: > > request = HttpContextHelper.getRequest(parameters); > text = request.getParameter("tekst"); > > The pages, including forms are ecoded in utf-8. > The String 'text' is strange: the original content (utf-8) is encoded > once again: > if the string on the form was one character, say 'é', the string has a > length of 4 bytes. It is the result of utf-8 encoding the two byte > character coming from the client. So, a second conversion is happening. > > Now: > new String(request.getParameter("text").getBytes("ISO-8859-1")) works > fine. > > Where should this be corrected?
Jos, in Cocoon 3 there isn't any code that changes the encoding of request parameters. The plain HttpServletRequest as provided by the servlet container is used. IIRC Tomcat uses ISO-8859-1 by default which follows the recommendation of the Servlet API spec: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ SRV.4.9 Request data encoding Currently, many browsers do not send a char encoding qualifier with the Content-Type header, leaving open the determination of the character encoding for reading HTTP requests. The default encoding of a request the container uses to create the request reader and parse POST data must be “ISO-8859-1” if none has been specified by the client request. However, in order to indicate to the developer in this case the failure of the client to send a character encoding, the container returns null from the getCharacterEncoding method. If the client hasn’t set character encoding and the request data is encoded with a different encoding than the default as described above, breakage can occur. To remedy this situation, a new method setCharacterEncoding(String enc) has been added to the ServletRequest interface. Developers can override the character encoding supplied by the container by calling this method. It must be called prior to parsing any post data or reading any input from the request. Calling this method once data has been read will not affect the encoding. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ So as some others suggested, the best option is using one of the CharecterEncoding servlet filters and not to remedy this situation somewhere in C3. -- Reinhard Pötz Managing Director, {Indoqa} GmbH http://www.indoqa.com/en/people/reinhard.poetz/ Member of the Apache Software Foundation Apache Cocoon Committer, PMC member reinh...@apache.org ________________________________________________________________________ --------------------------------------------------------------------- To unsubscribe, e-mail: users-unsubscr...@cocoon.apache.org For additional commands, e-mail: users-h...@cocoon.apache.org