Hi
Tom Christiansen recently contributed a API doc update [1] for
j.l.Character, as the followup for the
Unicode support discussion in j.l.Character/j.u.regex we had back to
January. In his doc patch, Tom
recommended to "downgrade" the doc for
j.l.Character.isLowCase/UpperCase(char/int) methods from
"character" to "letter" to accurately describe/specify what we
currently really do in j.l.c class, because
current j.l.c API spec and implementation for these methods are in fact
only about "letter", solely base
on whether the general category type of the character is
LOW/UPPERCASE_LETTER to decide if the
character is lowercase or uppercase. While the Unicode Standard clearly
specifies its definition of
lowercase/uppercase of a character as GC=Lu/Ll + Other_Lower/Uppercase
in ch04/4.2 Case [2]. As
the result of this difference the j.l.Character.isLowerCase/UpperCase()
methods don't work correctly
for all Unicode Other_Lowercase/Uppercase characters (201 of them, as in
Unicode 6.0) .
I totally agree with Tom on this his check. But instead of updating the
j.l.c document to describe
the difference between Java spec/implementation and the Unicode Standard
definition in JDK 7 and
leave the real solution to JDK8 (given we all agree this is something
we need to address in future
release, if we don't address it now), personally I prefer to address the
issue in one step, to update both
the spec and implementation of these methods to match the Unicode
Standard definition in JDK7, if we
can manage to squeeze this in at this very late stage of the release. It
appears Tom also prefers this
approach as well, if it is achievable.
So here is the webrev
http://cr.openjdk.java.net/~sherman/7037261/webrev
Other than these 4 isLowerCase/UpperCase() methods, We also proposed to
add two new methods
to support another two important Unicode character properties,
isAlphabetic/isIdeographic, which
are specified in Unicode Standard ch04/4.2 Case/4.11 [2] and defined in
tr44[3][4]
Given the "incompatible" nature of the request (these 4 methods change
behavior for those 201
code points), this proposal is under CCC review. Whether or not this
request can make itself into
JDK7 depends on the CCC review result and whether the review can be
finished before the final
cutoff schedule of JDK7.
Thanks,
Sherman
[1] http://mail.openjdk.java.net/pipermail/i18n-dev/2011-April/000358.html
[2] http://www.unicode.org/versions/Unicode6.0.0/ch04.pdf
[3] http://www.unicode.org/reports/tr44/#Alphabetic
[4] http://www.unicode.org/reports/tr44/#Ideographic