The best Chinese Analyzer?

2006-05-08 Thread Bob Cheung
I have a question for those who have used Lucene to index and search for Chinese Characters, what is the best Analyzer for the job? I know all these three can do the job: 1. StandardAnalyzer 2. CJKAnalyzer 3. ChineseAnalyzer What are the difference between these 3 analyzers? TIA. Regards, Bob

Indexing and searching with StandardAnalyzer

2006-05-08 Thread Bob Cheung
Using StandardAnalyzer, I was able to index a document containing the string co_cc (without quotes) but I couldn't search for it. Using Luke, I was able to see co_cc was indexed. Using Luke to search, I was not able to find any hit using StandardAnalyzer. However, if I use KeywordAnalyzer to

Sorting in Lucene

2006-03-13 Thread Bob Cheung
I am curious why the character / sorts before the space. For example, Apple/banana is good for you. Sorts before Apple banana is good for you Is there something I can do to make it sort correctly? Regards, Bob - To

RE: Sorting in Lucene

2006-03-13 Thread Bob Cheung
To: java-user@lucene.apache.org Subject: Re: Sorting in Lucene On 3/13/06, Bob Cheung [EMAIL PROTECTED] wrote: I am curious why the character / sorts before the space. For example, Apple/banana is good for you. Sorts before Apple banana is good for you Are you sure that the field is untokenized

RE: Indexing multiple languages

2005-06-02 Thread Bob Cheung
Hi Erik, I am a new comer to this list and please allow me to ask a dumb question. For the StandardAnalyzer, will it have to be modified to accept different character encodings. We have customers in China, Taiwan and Hong Kong. Chinese data may come in 3 different encoding: Big5, GB and UTF8.