You might want to check out this page http://wiki.apache.org/solr/SolrTomcat
Tomcat needs a small config change out of the box to properly support UTF-8. Thanks, Charlie -----Original Message----- From: Mario Knezovic [mailto:[EMAIL PROTECTED] Sent: Friday, August 17, 2007 12:58 PM To: solr-user@lucene.apache.org Subject: UTF-8 encoding problem on one of two Solr setups Hi all, I have set up an identical Solr 1.1 on two different machines. One works fine, the other one has a UTF-8 encoding problem. #1 is my local Windows XP machine. Solr is running basically in a configuration like in the tutorial example with Jetty/5.1.11RC0 (Windows XP/5.1 x86 java/1.6.0). Everything works fine here as expected. #2 is a Linux machine with Solr running inside Tomcat 6. The problem happens here. This is the place where Solr will be running finally. To rule out all problems in my PHP and Java code, I tested the problem with the Solr admin page and it happens there as well. (Tested with Firefox 2 with site's char encoding UTF-8.) When entering an arbitrary search string containing UTF-8 chars I get a correct response from the local Windows Solr setup: <?xml version="1.0" encoding="UTF-8"?> <response> <lst name="responseHeader"> <int name="status">0</int> <int name="QTime">0</int> <lst name="params"> <str name="indent">on</str> <str name="start">0</str> <str name="q">München</str> <-- sample string containing a German umlaut-u <str name="rows">10</str> <str name="version">2.2</str> </lst> </lst> [...] When I do exactly the same, just on the admin page of the other Solr setup (but from exactly the same browser), I get the following response: [...] <str name="q">item$searchstring_de:München</str> [...] Obviously the umlaut-u UTF-8 bytes 0xC3 0xB6 had been interpreted as two 8-bit chars instead of one UTF-8 char. Unfortunately I am pretty new to Solr, Tomcat and related topics, so I was not able to find the problem yet. My guess is that it is outside of Solr, maybe in the Tomcat configuration, but so far I spent the entire day without a further clue. But apart from that Solr really rocks. Indexing tons of content and searching works just fine and fast and it was pretty easy to get into everything. Now I am changing all data to UTF-8 and ran into my first serious obstacle... after a few weeks of Solr usage! Any hint/help appreciated. Thank you very much. Mario