Have you verified that your form inputs are getting to your query objects without the String being mangled due to encoding problems?
I'm getting japanese in UTF-8 and use the technique described at http://w6.metronet.com/~wjm/tomcat/2001/Aug/msg00230.html to get the data from the browser to Lucene. I build my index using the HTMLParser in the lucene demos and give them a Reader object that was created from an InputStreamReader that specifies the HTML file encodings (Shift_jis in my case). There are a bunch of other issues I'm working on to support Japanese, but I'm getting search results at this point. The two places that encodings should come into play for you are parsing your source content into the Reader or String that you use to create org.apache.lucene.document.Field objects and getting the user query from their browser to the Query objects. Eric -- Eric D. Isakson SAS Institute Inc. Application Developer SAS Campus Drive XML Technologies Cary, NC 27513 (919) 531-3639 http://www.sas.com -----Original Message----- From: MERCIER ALEXANDRE [mailto:[EMAIL PROTECTED] Sent: Tuesday, March 18, 2003 11:36 AM To: [EMAIL PROTECTED] Subject: Indexing and searching non-latin languages using utf-8 Hi all, I've a matter with indexing then searching docs written in non-latin languages and encoded in utf-8 (Russian, by example). I have a web application, with a simple form to search in the contents of the docs. When I submit the form, I encode the query term in utf-8 with encodeURI(String) but I match no doc. I think that is due to a bad indexing but I'm not sure. Lucene is normally indexing docs in writing Terms in the 'xxx.tis' file, encoding it in utf-8, I believe. So when it reads the file, it correctly gets russian characters (2 bytes) but when writing them in the index, they seem different (I've listed the terms in my application console). If someone has a solution to resolve my problem, all advices are welcome. Thanks. Alex --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
