[ http://issues.apache.org/jira/browse/LUCENE-478?page=comments#action_12361679 ]
Daniel Naber commented on LUCENE-478: ------------------------------------- John, I'm not sure I understand: do you think that this issue can be closed now? If not, could you ask your i18n experts how your changes could be integrated into the current code (the one where K/Korean and CJ are separate things)? > CJK char list > ------------- > > Key: LUCENE-478 > URL: http://issues.apache.org/jira/browse/LUCENE-478 > Project: Lucene - Java > Type: Bug > Components: Analysis > Versions: 1.4 > Reporter: John Wang > Priority: Minor > > Seems the character list in the CJK section of the StandardTokenizer.jj is > not quite complete. Following is a more complete list: > < CJK: // non-alphabets > [ > "\u1100"-"\u11ff", > "\u3040"-"\u30ff", > "\u3130"-"\u318f", > "\u31f0"-"\u31ff", > "\u3300"-"\u337f", > "\u3400"-"\u4dbf", > "\u4e00"-"\u9fff", > "\uac00"-"\ud7a3", > "\uf900"-"\ufaff", > "\uff65"-"\uffdc" > ] > > -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]