CJK char list
-------------
Key: LUCENE-478
URL: http://issues.apache.org/jira/browse/LUCENE-478
Project: Lucene - Java
Type: Bug
Components: Analysis
Versions: 1.4
Reporter: John Wang
Priority: Minor
Seems the character list in the CJK section of the StandardTokenizer.jj is not
quite complete. Following is a more complete list:
< CJK: // non-alphabets
[
"\u1100"-"\u11ff",
"\u3040"-"\u30ff",
"\u3130"-"\u318f",
"\u31f0"-"\u31ff",
"\u3300"-"\u337f",
"\u3400"-"\u4dbf",
"\u4e00"-"\u9fff",
"\uac00"-"\ud7a3",
"\uf900"-"\ufaff",
"\uff65"-"\uffdc"
]
>
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]