StandardTokenizer splitting all of Korean words into separate characters
------------------------------------------------------------------------
Key: LUCENE-461
URL: http://issues.apache.org/jira/browse/LUCENE-461
Project: Lucene - Java
Type: Bug
Components: Analysis
Environment: Analyzing Korean text with Apache Lucene, esp. with
StandardAnalyzer.
Reporter: Cheolgoo Kang
Priority: Minor
StandardTokenizer splits all those Korean words inth separate character tokens.
For example, "안녕하세요" is one Korean word that means "Hello", but
StandardAnalyzer separates it into five tokens of "안", "녕", "하", "세", "요".
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]