CJK char list ------------- Key: LUCENE-478 URL: http://issues.apache.org/jira/browse/LUCENE-478 Project: Lucene - Java Type: Bug Components: Analysis Versions: 1.4 Reporter: John Wang Priority: Minor
Seems the character list in the CJK section of the StandardTokenizer.jj is not quite complete. Following is a more complete list: < CJK: // non-alphabets [ "\u1100"-"\u11ff", "\u3040"-"\u30ff", "\u3130"-"\u318f", "\u31f0"-"\u31ff", "\u3300"-"\u337f", "\u3400"-"\u4dbf", "\u4e00"-"\u9fff", "\uac00"-"\ud7a3", "\uf900"-"\ufaff", "\uff65"-"\uffdc" ] > -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]