read this:
http://www.crazysquirrel.com/compgen/form-encoding.php
then in your receiving servlet:
String query_string = request.getParameter("query");String query_string = new String(query_string.getBytes(),request.getCharacterEncoding());
then pass query_string to lucene. This ensures that the string fetched by getParameter() is encoded by the right encoding.
Hope this helps!
Mvh Karl �ie
On 11. apr. 2005, at 11.54, Eric Chow wrote:
Hello,
I am a beginner in using Lucene.
My files are contains different language (English, Chinese, Portuguese, Japanese and some Asian languages, non-latin languages). They always contain in one file. Therefore, I have to use UTF-8 to save the contents.
I am now developing a web-based search engine. I use Lucene to create index for those files and search it in web. The charset of the web page is UTF-8, but it cannot search anything.
I try to use some Analyser (CJKAnalyser, ChineseAnalyser, StandardAnalyser, SimpleAnalyser), still failed.
Finally, I tested to use original charset, for example, the Chinese contents I used BIG5, and I can search it very well. For those English, of couse, no problem.
But I can't use UTF-8 as the charset for documents. Any suggest and examples ?
Best regards, Eric
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
- ...I wonder if the really nerdy Klingons learn how to speak english?
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
