Re: [jira] Created: (LUCENE-444) StandardTokenizer loses Korean characters

Erik Hatcher Wed, 05 Oct 2005 03:36:24 -0700

Never mind.... I see Otis beat me to it.

    Erik



On Oct 4, 2005, at 10:38 PM, Youngho Cho wrote:

Hello,

Is there any plan to add this patch into lucene core ?
I am using CJKAnalyzer but I hope to switch to the StanadardAnalyzer.

Thanks,

Youngho

----- Original Message -----
From: "Cheolgoo Kang (JIRA)" <[EMAIL PROTECTED]>
To: <java-dev@lucene.apache.org>
Sent: Tuesday, October 04, 2005 11:26 PM
Subject: [jira] Created: (LUCENE-444) StandardTokenizer losesKorean characters
StandardTokenizer loses Korean characters
-----------------------------------------

         Key: LUCENE-444
         URL: http://issues.apache.org/jira/browse/LUCENE-444
     Project: Lucene - Java
        Type: Bug
  Components: Analysis
    Reporter: Cheolgoo Kang
    Priority: Minor
While using StandardAnalyzer, exp. StandardTokenizer with Koreantext stream, StandardTokenizer ignores the Korean characters. Thisis because the definition of CJK token in StandardTokenizer.jjJavaCC file doesn't have enough range covering Korean syllablesdescribed in Unicode character map.This patch adds one line of 0xAC00~0xD7AF, the Korean syllablesrange to the StandardTokenizer.jj code.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of theadministrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [jira] Created: (LUCENE-444) StandardTokenizer loses Korean characters

Reply via email to