Hi

Tom Christiansen recently contributed a API doc update [1] for j.l.Character, as the followup for the Unicode support discussion in j.l.Character/j.u.regex we had back to January. In his doc patch, Tom recommended to "downgrade" the doc for j.l.Character.isLowCase/UpperCase(char/int) methods from "character" to "letter" to accurately describe/specify what we currently really do in j.l.c class, because current j.l.c API spec and implementation for these methods are in fact only about "letter", solely base on whether the general category type of the character is LOW/UPPERCASE_LETTER to decide if the character is lowercase or uppercase. While the Unicode Standard clearly specifies its definition of lowercase/uppercase of a character as GC=Lu/Ll + Other_Lower/Uppercase in ch04/4.2 Case [2]. As the result of this difference the j.l.Character.isLowerCase/UpperCase() methods don't work correctly for all Unicode Other_Lowercase/Uppercase characters (201 of them, as in Unicode 6.0) .

I totally agree with Tom on this his check. But instead of updating the j.l.c document to describe the difference between Java spec/implementation and the Unicode Standard definition in JDK 7 and leave the real solution to JDK8 (given we all agree this is something we need to address in future release, if we don't address it now), personally I prefer to address the issue in one step, to update both the spec and implementation of these methods to match the Unicode Standard definition in JDK7, if we can manage to squeeze this in at this very late stage of the release. It appears Tom also prefers this
approach as well, if it is achievable.

So here is the webrev

http://cr.openjdk.java.net/~sherman/7037261/webrev

Other than these 4 isLowerCase/UpperCase() methods, We also proposed to add two new methods to support another two important Unicode character properties, isAlphabetic/isIdeographic, which are specified in Unicode Standard ch04/4.2 Case/4.11 [2] and defined in tr44[3][4]

Given the "incompatible" nature of the request (these 4 methods change behavior for those 201 code points), this proposal is under CCC review. Whether or not this request can make itself into JDK7 depends on the CCC review result and whether the review can be finished before the final
cutoff schedule of JDK7.

Thanks,
Sherman

[1] http://mail.openjdk.java.net/pipermail/i18n-dev/2011-April/000358.html
[2] http://www.unicode.org/versions/Unicode6.0.0/ch04.pdf
[3] http://www.unicode.org/reports/tr44/#Alphabetic
[4] http://www.unicode.org/reports/tr44/#Ideographic

Reply via email to