That is right! It is just a confusing situation :-( The filter works fine. The init() method of a generator does not give a chance to call setCharacterEncoding, as the parsing already happened. The good thing is that the code is already in spring, so, no new external dependencies. Maybe later on I add a "tryToGuessEncodingFilter".
Jos On Mon, 2010-01-11 at 10:49 +0100, Reinhard Pötz wrote: > Jos Snellings wrote: > > Hi, > > > > HttpServletRequest looks 'imperfect': > > Cocoon 3, alpha 2. > > A generator accesses the HttpServletRequest in the setup method: > > > > request = HttpContextHelper.getRequest(parameters); > > text = request.getParameter("tekst"); > > > > The pages, including forms are ecoded in utf-8. > > The String 'text' is strange: the original content (utf-8) is encoded > > once again: > > if the string on the form was one character, say 'é', the string has a > > length of 4 bytes. It is the result of utf-8 encoding the two byte > > character coming from the client. So, a second conversion is happening. > > > > Now: > > new String(request.getParameter("text").getBytes("ISO-8859-1")) works > > fine. > > > > Where should this be corrected? > > Jos, > > in Cocoon 3 there isn't any code that changes the encoding of request > parameters. The plain HttpServletRequest as provided by the servlet > container is used. > > IIRC Tomcat uses ISO-8859-1 by default which follows the recommendation > of the Servlet API spec: > > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > SRV.4.9 Request data encoding > Currently, many browsers do not send a char encoding qualifier with the > Content-Type header, leaving open the determination of the character > encoding for reading HTTP requests. The default encoding of a request > the container uses to create the request reader and parse POST data must > be “ISO-8859-1” if none has been specified by the client request. > However, in order to indicate to the developer in this case the failure > of the client to send a character encoding, the container returns null > from the getCharacterEncoding method. > If the client hasn’t set character encoding and the request data is > encoded with a different encoding than the default as described above, > breakage can occur. To remedy this situation, a new method > setCharacterEncoding(String enc) has been added to the ServletRequest > interface. Developers can override the character encoding supplied by > the container by calling this method. It must be called prior to parsing > any post data or reading any input from the request. Calling > this method once data has been read will not affect the encoding. > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > > So as some others suggested, the best option is using one of the > CharecterEncoding servlet filters and not to remedy this situation > somewhere in C3. > --------------------------------------------------------------------- To unsubscribe, e-mail: users-unsubscr...@cocoon.apache.org For additional commands, e-mail: users-h...@cocoon.apache.org