Re: Urgent, please help Index/Search in UTF-8 ???

Karl Øie Mon, 11 Apr 2005 03:09:49 -0700

If you use a servlet and a HTML Form to feed queries to the QueryParser take good care of all configurations around the servlet container. If you, like me, use tomcat you might have to recode the query into internal java form (utf-8) before you pass it to lucene.


read this:

http://www.crazysquirrel.com/compgen/form-encoding.php


then in your receiving servlet:

String query_string = request.getParameter("query");

String query_string = new String(query_string.getBytes(),request.getCharacterEncoding());

then pass query_string to lucene. This ensures that the string fetched by getParameter() is encoded by the right encoding.

Hope this helps!

Mvh Karl �ie

On 11. apr. 2005, at 11.54, Eric Chow wrote:

Hello,


I am a beginner in using Lucene.


My files are contains different language (English, Chinese,
Portuguese, Japanese and some Asian languages, non-latin languages).
They always contain in one file.
Therefore, I have to use UTF-8 to save the contents.

I am now developing a web-based search engine. I use Lucene to create
index for those files and search it in web. The charset of the web
page is UTF-8, but it cannot search anything.

I try to use some Analyser (CJKAnalyser, ChineseAnalyser,
StandardAnalyser, SimpleAnalyser), still failed.

Finally, I tested to use original charset, for example, the Chinese
contents I used BIG5, and I can search it very well. For those
English, of couse, no problem.

But I can't use UTF-8 as the charset for documents. Any suggest and examples ?


Best regards,
Eric

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

- ...I wonder if the really nerdy Klingons learn how to speak english?


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Urgent, please help Index/Search in UTF-8 ???

Reply via email to