Hi Steven, I understood (and still do, actually) 5.b as "added U+3d2e - U+4DB5 and excluded....", so I expected to see the U+3d2e - U+4DB5 range in the patch, but I didn't see it. The closest range was this:
+ "\u3400"-"\u4db5", I'm about to go on vacation (no TV, no radio, no email, no Internet, no java, just sea, salt, sun), so please have a look at the version in the trunk and if any other ranges are missing, please send a patch. Also feel free to look at those other ranges I left commented out in there. Bob Carpenter should recognize them. :) Otis ----- Original Message ---- From: Steven Rowe To: java-dev@lucene.apache.org Sent: Sunday, August 13, 2006 9:36:06 AM Subject: Re: [jira] Resolved: (LUCENE-478) CJK char list Otis Gospodnetic (JIRA) wrote: > [ http://issues.apache.org/jira/browse/LUCENE-478?page=all ] > > Otis Gospodnetic resolved LUCENE-478. > ------------------------------------- > > Resolution: Fixed > > Thanks, I committed Steven Rowe's patch, although it doesn't seem to > fully match what he said in comments above (e.g. in his patch, I > don't see the range he mentioned in 5.b). Hi Otis, Here's 5.b.: 5. Character ranges in John's list that are missing in StandardTokenizer.jj, and that should be added to the newly re-labeled section: 5.b. [ U+3d2e - U+4DB5 ] (non-chars [ U+4DB6 - U+4DBF ] excluded) CJK Ideograph Extension A. This range was introduced in Unicode 3.0. And here's the corresponding change from the patch: "\u3300"-"\u337f", - "\u3400"-"\u3d2d", + "\u3400"-"\u4db5", "\u4e00"-"\u9fff", I don't understand - it looks to me like the above change adds the range mentioned in 5.b. Are there other inconsistencies? (You said that 5.b. was an example.) Steve --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]