StandardTokenizer splitting all of Korean words into separate characters ------------------------------------------------------------------------
Key: LUCENE-461 URL: http://issues.apache.org/jira/browse/LUCENE-461 Project: Lucene - Java Type: Bug Components: Analysis Environment: Analyzing Korean text with Apache Lucene, esp. with StandardAnalyzer. Reporter: Cheolgoo Kang Priority: Minor StandardTokenizer splits all those Korean words inth separate character tokens. For example, "안녕하세요" is one Korean word that means "Hello", but StandardAnalyzer separates it into five tokens of "안", "녕", "하", "세", "요". -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]