Never mind.... I see Otis beat me to it.
Erik
On Oct 4, 2005, at 10:38 PM, Youngho Cho wrote:
Hello,
Is there any plan to add this patch into lucene core ?
I am using CJKAnalyzer but I hope to switch to the StanadardAnalyzer.
Thanks,
Youngho
----- Original Message -----
From: "Cheolgoo Kang (JIRA)" <[EMAIL PROTECTED]>
To: <java-dev@lucene.apache.org>
Sent: Tuesday, October 04, 2005 11:26 PM
Subject: [jira] Created: (LUCENE-444) StandardTokenizer loses
Korean characters
StandardTokenizer loses Korean characters
-----------------------------------------
Key: LUCENE-444
URL: http://issues.apache.org/jira/browse/LUCENE-444
Project: Lucene - Java
Type: Bug
Components: Analysis
Reporter: Cheolgoo Kang
Priority: Minor
While using StandardAnalyzer, exp. StandardTokenizer with Korean
text stream, StandardTokenizer ignores the Korean characters. This
is because the definition of CJK token in StandardTokenizer.jj
JavaCC file doesn't have enough range covering Korean syllables
described in Unicode character map.
This patch adds one line of 0xAC00~0xD7AF, the Korean syllables
range to the StandardTokenizer.jj code.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the
administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]