Page: http://wiki.cocoondev.org/Wiki.jsp?page=RequestParameterEncoding ,
version: 1 on Thu Mar 13 17:13:46 2003 by 157.193.121.51
New page created:
+ !!!Request parameter encoding
+
+ !!Basics
+
+ If your Cocoon application needs to read request parameters that could
contain "special" characters, i.e. characters outside of the first 128 ASCII
characters, you'll need to pay attention to what encoding is used.
+
+ Normally a browser will send data to the server using the same encoding as
the page containing the submitted form (or whatever). So if the pages are
serialized using UTF-8, the browser will submit form data using UTF-8. The user
can change the encoding, but it's quite safe to assume he/she won't do that
(have you ever done it?).
+
+ After doing some tests with popular browser's, I've noticed that usually
browsers will not let the server know what encoding they used to encode the
parameters, so we need to make sure ourselves that the encoding used when
serializing pages corresponds to the encoding used when decoding request
parameters.
+
+ First of all, check in the sitemap what encoding is used when serializing
HTML pages:
+
+ {{{
+ <map:serializer logger="sitemap.serializer.html" mime-type="text/html"
+ name="html" pool-grow="4" pool-max="32" pool-min="4"
+ src="org.apache.cocoon.serialization.HTMLSerializer">
+ <buffer-size>1024</buffer-size>
+ <encoding>UTF-8</encoding>
+ </map:serializer>
+ }}}
+
+ In the example above, UTF-8 is the encoding used. This is a widely supported
Unicode encoding, so it is often a good choice.
+
+ The HTML serializer will automatically insert a <meta> tag into the HTML
page's HEAD element specifying the encoding. Most browsers apparently require
this. The HTML serializer will however only do this if your page already
+ contains a HEAD (or head) element, so make sure it has one. The <meta>
element inserted by the serializer will then look as follows:
+
+ {{{
+ <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
+ }}}
+
+ By default, if the browser doesn't explicitely mention the encoding, a
servlet container will decode request parameters using the ISO-8859-1 encoding
(independent of the platform on which the container is running). So in the
above case where UTF-8 was used when serializing, we would be facing problems.
+
+ The encoding to use when decoding request parameters can be configured in the
web.xml by supplying init parameters called "form-encoding" and
"container-encoding" to the Cocoon servlet. The container-encoding parameter
indicates according to what encoding the container tried to decode the request
parameters (normally ISO-8859-1), and the form-encoding parameter indicates the
actual encoding. Here's an example of how to specify the parameters in the
web.xml:
+
+ {{{
+ <init-param>
+ <param-name>container-encoding</param-name>
+ <param-value>ISO-8859-1</param-value>
+ </init-param>
+ <init-param>
+ <param-name>form-encoding</param-name>
+ <param-value>UTF-8</param-value>
+ </init-param>
+ }}}
+
+ For Java-insiders: what Cocoon actually does internally is apply the
following trick to get a parameter correctly decoded: suppose "value" is a
string containing a request parameter, then Cocoon will do:
+
+ {{{
+ value = new String(value.getBytes("ISO-8859-1"), "UTF-8");
+ }}}
+
+ So it recodes the incorrectly decoded string back to bytes and decodes it
using the correct encoding.
+
+ !!Locally overriding the form-encoding
+
+ Cocoon is ideally suited for publishing to different kinds of devices, and it
may well be possible that for certain devices, it is required to use different
encodings. In this case, you can redefine the form-encoding for specific
pipelines using the SetCharacterEncodingAction.
+
+ To use it, first of all make sure the action is declared in the map:actions
element of the sitemap:
+ {{{
+ <map:action name="set-encoding"
src="org.apache.cocoon.acting.SetCharacterEncodingAction"/>
+ }}}
+
+ and then call the action at the required location as follows:
+ {{{
+ <map:act type="set-encoding">
+ <map:parameter name="form-encoding" value="some-other-encoding"/>
+ </map:act>
+ }}}
+
+ !!Problems with components using the original HttpServletRequest
(JSPGenerator, ...)
+
+ Some components such as the JSPGenerator use the original HttpServletRequest
object, instead of the Cocoon Request object. In that case, the correct
decoding of request parameters will not happen (that is, if for example the JSP
page itself would read request parameters).
+
+ One possible solution would be to patch these components to use a wrapper
class that delegates all calls to the HttpServletRequest object, except for the
getParameter or getParameterValues methods, which should be delegated to
Cocoon's Request object.
+
+ There's an easier solution that can be applied right away if your servlet
container supports the Servlet 2.3 specification. Starting from 2.3, the
Servlet specification allows to explicitely set the encoding to be used for
decoding request parameters, though this has to happen before the first request
data is read. Since Cocoon reads request parameters itself (such as
cocoon-reload), this would require modification of the CocoonServlet. But it
can also be done using a servlet filter. Tomcat 4 contains just such a filter
in its "examples" webapp. Look for the file
jakarta-tomcat/webapps/examples/WEB-INF/classes/filters/SetCharacterEncodingFilter.java.
Compile it (with servlet.jar in the classpath), put it in a jar (using correct
package and such) and put the jar in your webapps WEB-INF/lib directory.
+
+ Now modify your webapp's web.xml file to include the following (after the
display-name and description elements, but before the servlet element):
+
+ {{{
+ <filter>
+ <filter-name>Set Character Encoding</filter-name>
+ <filter-class>filters.SetCharacterEncodingFilter</filter-class>
+ <init-param>
+ <param-name>encoding</param-name>
+ <param-value>UTF-8</param-value>
+ </init-param>
+ </filter>
+
+ <filter-mapping>
+ <filter-name>Set Character Encoding</filter-name>
+ <url-pattern>/*</url-pattern>
+ </filter-mapping>
+ }}}
+
+ Since the filter element is new in the servlet 2.3 specification, you might
need to modify the DOCTYPE declaration in the web.xml:
+
+ {{{
+ <!DOCTYPE web-app
+ PUBLIC "-//Sun Microsystems, Inc.//DTD Web Application 2.3//EN"
+ "http://java.sun.com/dtd/web-app_2_3.dtd">
+ }}}
+
+ Of course, when using a servlet filter to set the encoding, you should not
supply the form-encoding init parameter anymore in the web.xml. You could still
supply the container-encoding parameter, though its value will now have to be
the same as the encoding supplied to the filter. This will allow you to
override the form-encoding using the SetCharacterEncodingAction, though only
for the Cocoon Request object.
+
+ Using a servlet filter also has the advantage that it will work for any
servlet. Suppose your webapp consists of multiple servlets, with Cocoon being
only one of them. Sometimes the processing could start in another servlet
(which sets the character encoding correctly) and then be forwarded to Cocoon,
while other times the processing could start immediately in the Cocoon servlet.
It would then be impossible to know in Cocoon whether the request parameter
encoding needs to be corrected or not.
+
Page: http://wiki.cocoondev.org/Wiki.jsp?page=BrunoDumon , version: 2 on Thu
Mar 13 17:17:08 2003 by 157.193.121.51
- * [ImplementingTransformers]
+ * [DevelopingComponents] and [ImplementingTransformers]
+ * [RequestParameterEncoding]