Tom
I have filed CR/RFE 7036910:
j.l.Character.toLowerCaseCharArray/toTitleCaseCharArray for this request.
The j.l.Character.toLowerCase/toUpperCase() suggests to use
String.toLower/UpperCase() for case mapping,
if you want 1:M mapping taken care. And if you trust the API:-), which
you should in this case, you will find
that String.toLowerCase/toUpperCase() do handle 1:M correctly. Yes, we
don't have a toLowerCaseCharArray()
in j.l.c, however, as you noticed that there is ONLY one 1:M case
mapping for toLowerCase, at least for now,
and our String.toLowerCase() implementation "hardcodeds" that u+0130 as
the special case. That said, I
yet to dig out the history of toUpperCaseCharArray... and I agree, from
API design point of view, it would be
more nature to have the pair.
Yes, we do have a RFE 6423415: (str) Add String.toTitleCase()
But given the nature of "title case", the String#toTitleCase() might not
be what you would like it to be. It
would be strange if String#toTitleCase() does the similar thing the
String.toLower/UpperCase() do, in which
it title-case-maps all characters inside the String, most people
probably would expect it only title-case-map
the first character of the "title string". RFE 6423415 has very low
priority for now.
It might be more reasonable to have j.l.Character.toTitleCaseCharArray()
instead of j.l.String.toTitleCase().
-Sherman
On 4/14/2011 7:49 PM, Tom Christiansen wrote:
Sherman,
While I was fixing your docs for j.l.Character, I kept the Unicode
6.0 specs close at hand to make sure everything was up to date. That's
how I was able to discover that one could safely update this comment
that noted that 1:M uppercasings happen only in the BMP:
- // As of Unicode 4.0, 1:M uppercasings only happen in the BMP.
+ // As of Unicode 6.0, 1:M uppercasings only happen in the BMP.
I was very careful not to touch any code whatsoever--much though I
wanted to. :)
You see, you've got an obvious bug in that you have only a
toUpperCaseCharArray method to handle the full case mappings from
Unicode SpecialCasing.txt file. Clearly absent are corresponding
methods for the other two cases, lower and title:
toLowerCaseCharArray
toTitleCaseCharArray
This is a serious problem, because it means you grant access to full
Unicode casing for only one the three mappings. And it is not as though
there is anything in String that will take care of this, either! I was
shocked that there is no String#toTitleCase method. And I'm mistrustful
of the String#toLowerCase method, since there is no toLowerCaseCharArray
method in j.l.Character for it to access. So what does it do about
lowercasing this code point:
İ U+0130 LATIN CAPITAL LETTER I WITH DOT ABOVE
As you know, the lowercase for that code point is the two-
character string, "i\x{307}" (that is, "i\N{COMBINING DOT
ABOVE}"), and this is true no matter the locale; see
SpecialCasing.txt.
Here are the respective number of code points in Unicode that
have multichar mappings, the thing that is called "full" case
mapping in Unicode:
1 lowercase
42 titlecase
102 uppercase
It's not really the *number* of code points affected that is the trouble.
Rather, it is Java's complete inability to access them. It's like having a
"small" race condition. It's a hole in the spec. There really is no
reason to support full casing mapping only for uppercase but refuse it for
the other two casings. This is a very non-parallel situation, and I do not
understand why it exists; it should not.
Sherman, could you please file this bug report so that it gets to the right
queue, and then tell me its bug number? Maybe this is something that could
be fixed in a future JDK8 project sometime.
--tom