Re: QueryParser

Herb Roitblat Mon, 24 Mar 2014 06:01:28 -0700

The default query parser for CJK languages breaks text into bigrams. Aword consisting of characters ABCDE is broken into tokens AB, BC, CD,DE, or


"轻歌曼舞庆元旦"


into
data:轻歌 data:歌曼 data:曼舞 data:舞庆 data:庆元 data:元旦

Each pair may or may not be a word, but if you use the same parser (i.e.analyzer) for indexing and for searching, you should get reasonableresults. A more powerful parser, typically one that includes adictionary, is available, and may give more expected analyses at thecost of being slower.

Look here, for example:http://lucene.apache.org/core/4_0_0/analyzers-common/index.html

and here: http://lucene.apache.org/core/4_0_0/analyzers-smartcn/index.html



On 3/23/2014 11:21 PM, kalaik wrote:

Dear Team,

                 Any Update ?








---- On Fri, 21 Mar 2014 14:40:51 +0530 kalaik 
&lt;kalaiselva...@zohocorp.com&gt; wrote ----




Dear Team,

                 we are using lucene in our product , it well searching for 
high speed and performance but


                 Japaneese, chinese and korean language not searching properly 
we had use QueryParser


                 QueryParser is splitted into word like "轻歌曼舞庆元旦"


                  Example

This word "轻歌曼舞庆元旦"splited word : data:轻歌 data:歌曼 data:曼舞 data:舞庆 data:庆元 data:元旦


here is my code

                             Query query =  parser.parse(searchData);

logger.log(Level.INFO,"Search Query is calling {0}",query);TopDocs docs = is.search(query, resultRowSize);



In case of any clarification please get back to me. please help as soon as 
possible


Regards,
kalai..



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: QueryParser

Reply via email to