Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Cocoon Wiki" for change 
notification.

The following page has been changed by AlexanderKlimetschek:
http://wiki.apache.org/cocoon/RequestParameterEncoding

------------------------------------------------------------------------------
  = Request parameter encoding =
  
+ == How-to set everything to UTF-8 with Cocoon and CForms (with Ajax and Dojo) 
==
+ 
+ The best for internationalization is to handle everything in UTF-8, since 
this is probably the most intelligent encoding available out there. Everything 
means server side (Backend, XML), HTTP Requests/Responses and client side with 
forms and dojo.io.bind.
+ 
+ === 1. Sending all pages in UTF-8 ===
+ 
+ You need to configure Cocoon's serializers to UTF-8. The XML serializer 
({{{<serialize type="xml" />}}}) and the HTML serializer ({{{<serialize 
type="html" />}}}) need to be configured. To support all browsers, you must 
state the encoding to be used for the body and also include a meta tag in the 
html: {{{<meta http-equiv="Content-Type" content="text/html; 
charset=UTF-8">}}}. This is very important, since the browser will then send 
form requests encoded in UTF-8 (and browsers normaly don't mention the encoding 
in the request, so you have to assume they are doing it right). Here is the 
configuration for the serializer components for your sitemaps that will do that:
+ 
+ {{{
+ <serializer name="xml" mime-type="text/xml"
+   src="org.apache.cocoon.serialization.XMLSerializer">
+   <encoding>UTF-8</encoding>
+ </serializer>
+ 
+ <serializer name="html" mime-type="text/html; charset=UTF-8"
+   src="org.apache.cocoon.serialization.HTMLSerializer">
+   <encoding>UTF-8</encoding>
+ 
+   <!-- the following common doctype is only included for completeness, it has 
no impact on encoding -->
+   <doctype-public>-//W3C//DTD HTML 4.01 Transitional//EN</doctype-public>
+   <doctype-system>http://www.w3.org/TR/html4/loose.dtd</doctype-system>
+ </serializer>
+ }}}
+ 
+ === 2. AJAX Requests with CForms/Dojo ===
+ 
+ If you use CForms with ajax enabled, Cocoon will make use of dojo.io.bind() 
under the hood, which creates XMLHttpRequests that POST the form data to the 
server. Here Dojo decides the encoding by default, which does not match the 
browser's behaviour of using the charset defined in the META tag. But you can 
easily tell Dojo which formatting to use for all dojo.io.bind() calls, just 
include that in the top of your HTML pages, before dojo.js is included:
+ 
+ {{{
+ <script>djConfig = { bindEncoding: "utf-8" };</script>
+ }}}
+ 
+ You might already have other djConfig options, then simply add the 
{{{bindEncoding}}} property to the hash map.
+ 
+ === 3. Decoding incoming requests: Servlet Container ===
+ 
+ When the browser sends stuff to your server, eg. form data, the 
{{{ServletRequest}}} will be created by your servlet container, which needs to 
decode the parameters correctly into Java Strings. If there is the encoding 
specified in the HTTP request header, he will use that, but unfortunately this 
is typically not the case. When the browser sends a form post, he will only say 
{{{application/x-www-form-urlencoded}}} in the header. So you have to assume 
the encoding here, and the right thing to assume is the encoding of the page 
you originally sent to the browser.
+ 
+ The servlet standard says that the default encoding for incoming requests 
should be ISO-8859-1 (Jetty is not according to the standard here, it assumes 
UTF-8 by default). So to make sure UTF-8 is used for the parameter decoding, 
you have to tell your servlet that encoding explicitly. This is done by calling 
{{{ServletRequest.setCharacterEncoding()}}}. To do that for all your requests, 
you can use a servlet filter like this one: SetCharacterEncodingFilter.
+ 
+ Then you add the filter to the web.xml:
+ 
+ {{{
+ <filter>
+   <filter-name>Set Character Encoding</filter-name>
+   <filter-class>filters.SetCharacterEncodingFilter</filter-class>
+   <init-param>
+     <param-name>encoding</param-name>
+     <param-value>UTF-8</param-value>
+   </init-param>
+ </filter>
+ 
+ <!-- either mapping to URL pattern -->
+ 
+ <filter-mapping>
+   <filter-name>Set Character Encoding</filter-name>
+   <url-pattern>/*</url-pattern>
+ </filter-mapping>
+ 
+ <!-- or mapping to your Cocoon servlet (the servlet-name might be different) 
-->
+ 
+ <filter-mapping>
+   <filter-name>SetCharacterEncoding</filter-name>
+   <servlet-name>CocoonBlocksDispatcherServlet</servlet-name>
+ </filter-mapping>
+ 
+ }}}
+ 
+ Since the filter element was added in the servlet 2.3 specification, you need 
at least 2.3 in your web.xml, but using the current 2.4 version is better, it's 
the standard for Cocoon webapplications. For 2.4 you use a XSD schema:
+ 
+ {{{
+ <web-app version="2.4"
+          xmlns="http://java.sun.com/xml/ns/j2ee";
+          xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance";
+          xsi:schemaLocation="http://java.sun.com/xml/ns/j2ee 
http://java.sun.com/xml/ns/j2ee/web-app_2_4.xsd";>
+ }}}
+ 
+ For 2.3 you need to modify the DOCTYPE declaration in the web.xml:
+ 
+ {{{
+ <!DOCTYPE web-app
+     PUBLIC "-//Sun Microsystems, Inc.//DTD Web Application 2.3//EN"
+     "http://java.sun.com/dtd/web-app_2_3.dtd";>
+ }}}
+ 
+ === 4. Setting Cocoon's encoding (especially CForms) ===
+ 
+ To tell Cocoon to use UTF-8 internally, you have to set 2 properties:
+ 
+ {{{
+ org.apache.cocoon.containerencoding=utf-8
+ org.apache.cocoon.formencoding=utf-8
+ }}}
+ 
+ They need to be in some {{{*.properties}}} file under 
{{{META-INF/cocoon/properties}}} in one of your blocks.
+ 
+ === 5. XML Files ===
+ 
+ This is normally not a problem, since the standard encoding for XML files is 
UTF-8. However, they should always start with the following instruction, which 
should force your XML Editor to save them in UTF-8 (it looks like most of them 
do that, so there should not be a problem here).
+ 
+ {{{
+ <?xml version="1.0" encoding="UTF-8"?>
+ }}}
+ 
+ === 6. Special Transformers ===
+ 
+ The standard XSLT Transformers and others are working on SAX events, which 
are not serialized, thus encoding is not a problem. But there are some special 
transformers that pass stuff on to another library that does include 
serialization and might need a hint to use the correct encoding. One problem is 
for example the NekoHTMLTransformer: 
https://issues.apache.org/jira/browse/COCOON-2063.
+ 
+ If you think there might be a transformer doing things wrong in your 
pipeline, add a {{{TeeTransformer}}} between each step, outputting the XML 
between the transformers into temp1.xml, temp2.xml and so on to look for the 
place where your umlaute and special characters are messed up.
+ 
+ === 7. Your own XML serializing Sources ===
+ 
+ If you have your own Source implementation that needs to serialize XML, make 
sure it will do that in UTF-8 as well. A good idea is to use Cocoon's XML 
serializer, since we already configured that one to UTF-8 above. Sample code 
that does that is here: ["UseCocoonXMLSerializerCode"]
+ 
+ 
+ == Older documentation ==
+ 
- == Basics ==
+ === Basics ===
  
  If your Cocoon application needs to read request parameters that could 
contain ''special'' characters, i.e. characters outside of the first 128 ASCII 
characters, you'll need to pay attention to what encoding is used.