Otis Gospodnetic (JIRA) wrote:
> [ http://issues.apache.org/jira/browse/LUCENE-478?page=all ]
>
> Otis Gospodnetic resolved LUCENE-478.
> -------------------------------------
>
> Resolution: Fixed
>
> Thanks, I committed Steven Rowe's patch, although it doesn't seem to
> fully match what he said in comments above (e.g. in his patch, I
> don't see the range he mentioned in 5.b).
Hi Otis,
Here's 5.b.:
5. Character ranges in John's list that are missing in
StandardTokenizer.jj, and that should be added to the newly
re-labeled <CJ> section:
5.b. [ U+3d2e - U+4DB5 ] (non-chars [ U+4DB6 - U+4DBF ] excluded)
CJK Ideograph Extension A.
This range was introduced in Unicode 3.0.
And here's the corresponding change from the patch:
"\u3300"-"\u337f",
- "\u3400"-"\u3d2d",
+ "\u3400"-"\u4db5",
"\u4e00"-"\u9fff",
I don't understand - it looks to me like the above change adds the range
mentioned in 5.b.
Are there other inconsistencies? (You said that 5.b. was an example.)
Steve
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]