[ http://issues.apache.org/jira/browse/LUCENE-478?page=comments#action_12361497 ]
John Wang commented on LUCENE-478: ---------------------------------- Yes I am. Our i18n team has provided a more up-to-date list and I thought I'd contribute it back. -John > CJK char list > ------------- > > Key: LUCENE-478 > URL: http://issues.apache.org/jira/browse/LUCENE-478 > Project: Lucene - Java > Type: Bug > Components: Analysis > Versions: 1.4 > Reporter: John Wang > Priority: Minor > > Seems the character list in the CJK section of the StandardTokenizer.jj is > not quite complete. Following is a more complete list: > < CJK: // non-alphabets > [ > "\u1100"-"\u11ff", > "\u3040"-"\u30ff", > "\u3130"-"\u318f", > "\u31f0"-"\u31ff", > "\u3300"-"\u337f", > "\u3400"-"\u4dbf", > "\u4e00"-"\u9fff", > "\uac00"-"\ud7a3", > "\uf900"-"\ufaff", > "\uff65"-"\uffdc" > ] > > -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]