Page: http://wiki.cocoondev.org/Wiki.jsp?page=RequestParameterEncoding , 
version: 1 on Thu Mar 13 17:13:46 2003 by 157.193.121.51

New page created:
+ !!!Request parameter encoding
+ 
+ !!Basics
+ 
+ If your Cocoon application needs to read request parameters that could 
contain "special" characters, i.e. characters outside of the first 128 ASCII 
characters, you'll need to pay attention to what encoding is used.
+ 
+ Normally a browser will send data to the server using the same encoding as 
the page containing the submitted form (or whatever). So if the pages are 
serialized using UTF-8, the browser will submit form data using UTF-8. The user 
can change the encoding, but it's quite safe to assume he/she won't do that 
(have you ever done it?).
+ 
+ After doing some tests with popular browser's, I've noticed that usually 
browsers will not let the server know what encoding they used to encode the 
parameters, so we need to make sure ourselves that the encoding used when 
serializing pages corresponds to the encoding used when decoding request 
parameters.
+ 
+ First of all, check in the sitemap what encoding is used when serializing 
HTML pages:
+ 
+ {{{
+ <map:serializer logger="sitemap.serializer.html" mime-type="text/html"
+        name="html" pool-grow="4" pool-max="32" pool-min="4"
+        src="org.apache.cocoon.serialization.HTMLSerializer">
+   <buffer-size>1024</buffer-size>
+   <encoding>UTF-8</encoding>
+ </map:serializer>
+ }}}
+ 
+ In the example above, UTF-8 is the encoding used. This is a widely supported 
Unicode encoding, so it is often a good choice.
+ 
+ The HTML serializer will automatically insert a <meta> tag into the HTML 
page's HEAD element specifying the encoding. Most browsers apparently require 
this. The HTML serializer will however only do this if your page already
+ contains a HEAD (or head) element, so make sure it has one. The <meta> 
element inserted by the serializer will then look as follows:
+ 
+ {{{
+ <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
+ }}}
+ 
+ By default, if the browser doesn't explicitely mention the encoding, a 
servlet container will decode request parameters using the ISO-8859-1 encoding 
(independent of the platform on which the container is running). So in the 
above case where UTF-8 was used when serializing, we would be facing problems.
+ 
+ The encoding to use when decoding request parameters can be configured in the 
web.xml by supplying init parameters called "form-encoding" and 
"container-encoding" to the Cocoon servlet. The container-encoding parameter 
indicates according to what encoding the container tried to decode the request 
parameters (normally ISO-8859-1), and the form-encoding parameter indicates the 
actual encoding. Here's an example of how to specify the parameters in the 
web.xml:
+ 
+ {{{
+ <init-param>
+   <param-name>container-encoding</param-name>
+   <param-value>ISO-8859-1</param-value>
+ </init-param>
+ <init-param>
+   <param-name>form-encoding</param-name>
+   <param-value>UTF-8</param-value>
+ </init-param>
+ }}}
+ 
+ For Java-insiders: what Cocoon actually does internally is apply the 
following trick to get a parameter correctly decoded: suppose "value" is a 
string containing a request parameter, then Cocoon will do:
+ 
+ {{{
+ value = new String(value.getBytes("ISO-8859-1"), "UTF-8");
+ }}}
+ 
+ So it recodes the incorrectly decoded string back to bytes and decodes it 
using the correct encoding.
+ 
+ !!Locally overriding the form-encoding
+ 
+ Cocoon is ideally suited for publishing to different kinds of devices, and it 
may well be possible that for certain devices, it is required to use different 
encodings.  In this case, you can redefine the form-encoding for specific 
pipelines using the SetCharacterEncodingAction.
+ 
+ To use it, first of all make sure the action is declared in the map:actions 
element of the sitemap:
+ {{{
+ <map:action name="set-encoding" 
src="org.apache.cocoon.acting.SetCharacterEncodingAction"/>
+ }}}
+ 
+ and then call the action at the required location as follows:
+ {{{
+ <map:act type="set-encoding">
+   <map:parameter name="form-encoding" value="some-other-encoding"/>
+ </map:act>
+ }}}
+ 
+ !!Problems with components using the original HttpServletRequest 
(JSPGenerator, ...)
+ 
+ Some components such as the JSPGenerator use the original HttpServletRequest 
object, instead of the Cocoon Request object. In that case, the correct 
decoding of request parameters will not happen (that is, if for example the JSP 
page itself would read request parameters).
+ 
+ One possible solution would be to patch these components to use a wrapper 
class that delegates all calls to the HttpServletRequest object, except for the 
getParameter or getParameterValues methods, which should be delegated to 
Cocoon's Request object.
+ 
+ There's an easier solution that can be applied right away if your servlet 
container supports the Servlet 2.3 specification. Starting from 2.3, the 
Servlet specification allows to explicitely set the encoding to be used for 
decoding request parameters, though this has to happen before the first request 
data is read. Since Cocoon reads request parameters itself (such as 
cocoon-reload), this would require modification of the CocoonServlet. But it 
can also be done using a servlet filter.  Tomcat 4 contains just such a filter 
in its "examples" webapp. Look for the file 
jakarta-tomcat/webapps/examples/WEB-INF/classes/filters/SetCharacterEncodingFilter.java.
 Compile it (with servlet.jar in the classpath), put it in a jar (using correct 
package and such) and put the jar in your webapps WEB-INF/lib directory.
+ 
+ Now modify your webapp's web.xml file to include the following (after the 
display-name and description elements, but before the servlet element):
+ 
+ {{{
+ <filter>
+   <filter-name>Set Character Encoding</filter-name>
+   <filter-class>filters.SetCharacterEncodingFilter</filter-class>
+   <init-param>
+     <param-name>encoding</param-name>
+     <param-value>UTF-8</param-value>
+   </init-param>
+ </filter>
+ 
+ <filter-mapping>
+   <filter-name>Set Character Encoding</filter-name>
+   <url-pattern>/*</url-pattern>
+ </filter-mapping>
+ }}}
+ 
+ Since the filter element is new in the servlet 2.3 specification, you might 
need to modify the DOCTYPE declaration in the web.xml:
+ 
+ {{{
+ <!DOCTYPE web-app
+     PUBLIC "-//Sun Microsystems, Inc.//DTD Web Application 2.3//EN"
+     "http://java.sun.com/dtd/web-app_2_3.dtd";>
+ }}}
+ 
+ Of course, when using a servlet filter to set the encoding, you should not 
supply the form-encoding init parameter anymore in the web.xml. You could still 
supply the container-encoding parameter, though its value will now have to be 
the same as the encoding supplied to the filter. This will allow you to 
override the form-encoding using the SetCharacterEncodingAction, though only 
for the Cocoon Request object.
+ 
+ Using a servlet filter also has the advantage that it will work for any 
servlet.  Suppose your webapp consists of multiple servlets, with Cocoon being 
only one of them.  Sometimes the processing could start in another servlet 
(which sets the character encoding correctly) and then be forwarded to Cocoon, 
while other times the processing could start immediately in the Cocoon servlet. 
It would then be impossible to know in Cocoon whether the request parameter 
encoding needs to be corrected or not.
+ 


Page: http://wiki.cocoondev.org/Wiki.jsp?page=BrunoDumon , version: 2 on Thu 
Mar 13 17:17:08 2003 by 157.193.121.51

- * [ImplementingTransformers]
+ * [DevelopingComponents] and [ImplementingTransformers]
+ * [RequestParameterEncoding]


Reply via email to