[ http://issues.apache.org/jira/browse/LUCENE-478?page=all ]
Otis Gospodnetic resolved LUCENE-478. ------------------------------------- Resolution: Fixed Thanks, I committed Steven Rowe's patch, although it doesn't seem to fully match what he said in comments above (e.g. in his patch, I don't see the range he mentioned in 5.b). > CJK char list > ------------- > > Key: LUCENE-478 > URL: http://issues.apache.org/jira/browse/LUCENE-478 > Project: Lucene - Java > Issue Type: Bug > Components: Analysis > Affects Versions: 1.4 > Reporter: John Wang > Assigned To: Otis Gospodnetic > Priority: Minor > Attachments: StandardTokenizer.jj.diff, StandardTokenizer.jj.diff > > > Seems the character list in the CJK section of the StandardTokenizer.jj is > not quite complete. Following is a more complete list: > < CJK: // non-alphabets > [ > "\u1100"-"\u11ff", > "\u3040"-"\u30ff", > "\u3130"-"\u318f", > "\u31f0"-"\u31ff", > "\u3300"-"\u337f", > "\u3400"-"\u4dbf", > "\u4e00"-"\u9fff", > "\uac00"-"\ud7a3", > "\uf900"-"\ufaff", > "\uff65"-"\uffdc" > ] > > -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]