The default query parser for CJK languages breaks text into bigrams. A
word consisting of characters ABCDE is broken into tokens AB, BC, CD,
DE, or
"轻歌曼舞庆元旦"
into
data:轻歌 data:歌曼 data:曼舞 data:舞庆 data:庆元 data:元旦
Each pair may or may not be a word, but if you use the same parser (i.e.
analyzer) for indexing and for searching, you should get reasonable
results. A more powerful parser, typically one that includes a
dictionary, is available, and may give more expected analyses at the
cost of being slower.
Look here, for example:
http://lucene.apache.org/core/4_0_0/analyzers-common/index.html
and here: http://lucene.apache.org/core/4_0_0/analyzers-smartcn/index.html
On 3/23/2014 11:21 PM, kalaik wrote:
Dear Team,
Any Update ?
---- On Fri, 21 Mar 2014 14:40:51 +0530 kalaik
<kalaiselva...@zohocorp.com> wrote ----
Dear Team,
we are using lucene in our product , it well searching for
high speed and performance but
Japaneese, chinese and korean language not searching properly
we had use QueryParser
QueryParser is splitted into word like "轻歌曼舞庆元旦"
Example
This word "轻歌曼舞庆元旦"
splited word : data:轻歌 data:歌曼 data:曼舞 data:舞庆 data:庆元 data:元旦
here is my code
Query query = parser.parse(searchData);
logger.log(Level.INFO,"Search Query is calling {0}",query);
TopDocs docs = is.search(query, resultRowSize);
In case of any clarification please get back to me. please help as soon as
possible
Regards,
kalai..
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org