Urgent, please help, index/search in UTF-8 ???

Eric Chow Mon, 11 Apr 2005 03:02:03 -0700

Hello,

I am a beginner in using Lucene.


My files are contains different language (English, Chinese,
Portuguese, Japanese and some Asian languages, non-latin languages).
They always contain in one file.
Therefore, I have to use UTF-8 to save the contents.

I am now developing a web-based search engine. I use Lucene to create
index for those files and search it in web. The charset of the web
page is UTF-8, but it cannot search anything.

I try to use some Analyser (CJKAnalyser, ChineseAnalyser,
StandardAnalyser, SimpleAnalyser), still failed.

Finally, I tested to use original charset, for example, the Chinese
contents I used BIG5, and I can search it very well. For those
English, of couse, no problem.

But I can't use UTF-8 as the charset for documents. Any suggest and examples ?

Best regards,
Eric

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Urgent, please help, index/search in UTF-8 ???

Reply via email to