Otis Gospodnetic (JIRA) wrote: > [ http://issues.apache.org/jira/browse/LUCENE-478?page=all ] > > Otis Gospodnetic resolved LUCENE-478. > ------------------------------------- > > Resolution: Fixed > > Thanks, I committed Steven Rowe's patch, although it doesn't seem to > fully match what he said in comments above (e.g. in his patch, I > don't see the range he mentioned in 5.b).
Hi Otis, Here's 5.b.: 5. Character ranges in John's list that are missing in StandardTokenizer.jj, and that should be added to the newly re-labeled <CJ> section: 5.b. [ U+3d2e - U+4DB5 ] (non-chars [ U+4DB6 - U+4DBF ] excluded) CJK Ideograph Extension A. This range was introduced in Unicode 3.0. And here's the corresponding change from the patch: "\u3300"-"\u337f", - "\u3400"-"\u3d2d", + "\u3400"-"\u4db5", "\u4e00"-"\u9fff", I don't understand - it looks to me like the above change adds the range mentioned in 5.b. Are there other inconsistencies? (You said that 5.b. was an example.) Steve --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]