Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change 
notification.

The following page has been changed by KurosakaTeruhiko:
http://wiki.apache.org/nutch/GettingNutchRunningWithUtf8

New page:
= How to Configure App Servers to Pass non-ASCII Characters? =
Nutch GUI uses the GET method to pass the query strings to the server.  Tomcat 
4 and 5 need to be configured to enable passing of non-ASCII characters.

Note that this note describes how to make Tomcat pass non-ASCII characters.  
Nutch, in its "factory set" configuration, handle only limited characters.  
Especially, it will not handle Chinese/Japanese/Korean text properly.  (Each 
CJK character is treated as if it were a word by itself.)


== Tomcat 4 and Tomcat 5 ==
Tomcat changed its "factory set" configuration to allow only the ISO 8859-1 
encoding to be used in the GET method.  See 
http://issues.apache.org/bugzilla/show_bug.cgi?id=29900 for the rationale for 
the change.

To enable passing of UTF-8 characters, edit $TOMCAT/conf/server.xml.  Locate 
the <Connector> tag for the web (look for "8080") and insert this parameter 
assignment:
{{{URIEncoding="UTF-8"}}}
as explained in Tomcat 5 FAQ at 
http://tomcat.apache.org/faq/connectors.html#utf8


''Contributors, please add special configurations needed for other app servers 
below.''

Reply via email to