The default query parser for CJK languages breaks text into bigrams. A word consisting of characters ABCDE is broken into tokens AB, BC, CD, DE, or

"轻歌曼舞庆元旦"

into
data:轻歌 data:歌曼 data:曼舞 data:舞庆 data:庆元 data:元旦

Each pair may or may not be a word, but if you use the same parser (i.e. analyzer) for indexing and for searching, you should get reasonable results. A more powerful parser, typically one that includes a dictionary, is available, and may give more expected analyses at the cost of being slower.

Look here, for example: http://lucene.apache.org/core/4_0_0/analyzers-common/index.html
and here: http://lucene.apache.org/core/4_0_0/analyzers-smartcn/index.html



On 3/23/2014 11:21 PM, kalaik wrote:
Dear Team,

                 Any Update ?








---- On Fri, 21 Mar 2014 14:40:51 +0530 kalaik 
<kalaiselva...@zohocorp.com> wrote ----




Dear Team,

                 we are using lucene in our product , it well searching for 
high speed and performance but


                 Japaneese, chinese and korean language not searching properly 
we had use QueryParser


                 QueryParser is splitted into word like "轻歌曼舞庆元旦"


                  Example
This word "轻歌曼舞庆元旦" splited word : data:轻歌 data:歌曼 data:曼舞 data:舞庆 data:庆元 data:元旦

here is my code

                             Query query =  parser.parse(searchData);
logger.log(Level.INFO,"Search Query is calling {0}",query); TopDocs docs = is.search(query, resultRowSize);


In case of any clarification please get back to me. please help as soon as 
possible


Regards,
kalai..














---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to