Re: [jira] Created: (LUCENE-444) StandardTokenizer loses Korean characters

Otis Gospodnetic Tue, 04 Oct 2005 20:55:33 -0700

Try the version from SVN, I just applied Cheolgoo's patch.

Otis


--- Youngho Cho <[EMAIL PROTECTED]> wrote:

> Hello,
> 
> Is there any plan to add this patch into lucene core ?
> I am using CJKAnalyzer but I hope to switch to the StanadardAnalyzer.
> 
> Thanks,
> 
> Youngho
> 
> ----- Original Message ----- 
> From: "Cheolgoo Kang (JIRA)" <[EMAIL PROTECTED]>
> To: <[email protected]>
> Sent: Tuesday, October 04, 2005 11:26 PM
> Subject: [jira] Created: (LUCENE-444) StandardTokenizer loses Korean
> characters
> 
> 
> > StandardTokenizer loses Korean characters
> > -----------------------------------------
> > 
> >          Key: LUCENE-444
> >          URL: http://issues.apache.org/jira/browse/LUCENE-444
> >      Project: Lucene - Java
> >         Type: Bug
> >   Components: Analysis  
> >     Reporter: Cheolgoo Kang
> >     Priority: Minor
> > 
> > 
> > While using StandardAnalyzer, exp. StandardTokenizer with Korean
> text stream, StandardTokenizer ignores the Korean characters. This is
> because the definition of CJK token in StandardTokenizer.jj JavaCC
> file doesn't have enough range covering Korean syllables described in
> Unicode character map.
> > This patch adds one line of 0xAC00~0xD7AF, the Korean syllables
> range to the StandardTokenizer.jj code.
> > 
> > -- 
> > This message is automatically generated by JIRA.
> > -
> > If you think it was sent incorrectly contact one of the
> administrators:
> >    http://issues.apache.org/jira/secure/Administrators.jspa
> > -
> > For more information on JIRA, see:
> >    http://www.atlassian.com/software/jira
> > 
> > 
> >
> ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [EMAIL PROTECTED]
> > For additional commands, e-mail: [EMAIL PROTECTED]


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [jira] Created: (LUCENE-444) StandardTokenizer loses Korean characters

Reply via email to