Review request: 7037261: j.l.Character.isLowerCase/isUpperCase need to match the Unicode Standard definition

Xueming Shen Tue, 19 Apr 2011 17:23:03 -0700

Hi

Tom Christiansen recently contributed a API doc update [1] forj.l.Character, as the followup for theUnicode support discussion in j.l.Character/j.u.regex we had back toJanuary. In his doc patch, Tomrecommended to "downgrade" the doc forj.l.Character.isLowCase/UpperCase(char/int) methods from"character" to "letter" to accurately describe/specify what wecurrently really do in j.l.c class, becausecurrent j.l.c API spec and implementation for these methods are in factonly about "letter", solely baseon whether the general category type of the character isLOW/UPPERCASE_LETTER to decide if thecharacter is lowercase or uppercase. While the Unicode Standard clearlyspecifies its definition oflowercase/uppercase of a character as GC=Lu/Ll + Other_Lower/Uppercasein ch04/4.2 Case [2]. Asthe result of this difference the j.l.Character.isLowerCase/UpperCase()methods don't work correctlyfor all Unicode Other_Lowercase/Uppercase characters (201 of them, as inUnicode 6.0) .

I totally agree with Tom on this his check. But instead of updating thej.l.c document to describethe difference between Java spec/implementation and the Unicode Standarddefinition in JDK 7 andleave the real solution to JDK8 (given we all agree this is somethingwe need to address in futurerelease, if we don't address it now), personally I prefer to address theissue in one step, to update boththe spec and implementation of these methods to match the UnicodeStandard definition in JDK7, if wecan manage to squeeze this in at this very late stage of the release. Itappears Tom also prefers this

approach as well, if it is achievable.

So here is the webrev

http://cr.openjdk.java.net/~sherman/7037261/webrev

Other than these 4 isLowerCase/UpperCase() methods, We also proposed toadd two new methodsto support another two important Unicode character properties,isAlphabetic/isIdeographic, whichare specified in Unicode Standard ch04/4.2 Case/4.11 [2] and defined intr44[3][4]

Given the "incompatible" nature of the request (these 4 methods changebehavior for those 201code points), this proposal is under CCC review. Whether or not thisrequest can make itself intoJDK7 depends on the CCC review result and whether the review can befinished before the final

cutoff schedule of JDK7.

Thanks,
Sherman

[1] http://mail.openjdk.java.net/pipermail/i18n-dev/2011-April/000358.html
[2] http://www.unicode.org/versions/Unicode6.0.0/ch04.pdf
[3] http://www.unicode.org/reports/tr44/#Alphabetic
[4] http://www.unicode.org/reports/tr44/#Ideographic

Review request: 7037261: j.l.Character.isLowerCase/isUpperCase need to match the Unicode Standard definition

Reply via email to