Re: java.lang.Character lacuna #1 of 2

Xueming Shen Fri, 15 Apr 2011 00:16:43 -0700

Tom

I have filed CR/RFE 7036910:j.l.Character.toLowerCaseCharArray/toTitleCaseCharArray for this request.

The j.l.Character.toLowerCase/toUpperCase() suggests to useString.toLower/UpperCase() for case mapping,if you want 1:M mapping taken care. And if you trust the API:-), whichyou should in this case, you will findthat String.toLowerCase/toUpperCase() do handle 1:M correctly. Yes, wedon't have a toLowerCaseCharArray()in j.l.c, however, as you noticed that there is ONLY one 1:M casemapping for toLowerCase, at least for now,and our String.toLowerCase() implementation "hardcodeds" that u+0130 asthe special case. That said, Iyet to dig out the history of toUpperCaseCharArray... and I agree, fromAPI design point of view, it would be

more nature to have the pair.

Yes, we do have a RFE 6423415: (str) Add String.toTitleCase()

But given the nature of "title case", the String#toTitleCase() might notbe what you would like it to be. Itwould be strange if String#toTitleCase() does the similar thing theString.toLower/UpperCase() do, in whichit title-case-maps all characters inside the String, most peopleprobably would expect it only title-case-mapthe first character of the "title string". RFE 6423415 has very lowpriority for now.

It might be more reasonable to have j.l.Character.toTitleCaseCharArray()instead of j.l.String.toTitleCase().


-Sherman


On 4/14/2011 7:49 PM, Tom Christiansen wrote:

Sherman,

While I was fixing your docs for j.l.Character, I kept the Unicode
6.0 specs close at hand to make sure everything was up to date.  That's
how I was able to discover that one could safely update this comment
that noted that 1:M uppercasings happen only in the BMP:

      -        // As of Unicode 4.0, 1:M uppercasings only happen in the BMP.
      +        // As of Unicode 6.0, 1:M uppercasings only happen in the BMP.

I was very careful not to touch any code whatsoever--much though I
wanted to. :)

You see, you've got an obvious bug in that you have only a
toUpperCaseCharArray method to handle the full case mappings from
Unicode SpecialCasing.txt file.  Clearly absent are corresponding
methods for the other two cases, lower and title:

     toLowerCaseCharArray
     toTitleCaseCharArray

This is a serious problem, because it means you grant access to full
Unicode casing for only one the three mappings.  And it is not as though
there is anything in String that will take care of this, either!  I was
shocked that there is no String#toTitleCase method.  And I'm mistrustful
of the String#toLowerCase method, since there is no toLowerCaseCharArray
method in j.l.Character for it to access.  So what does it do about
lowercasing this code point:

      İ   U+0130 LATIN CAPITAL LETTER I WITH DOT ABOVE

As you know, the lowercase for that code point is the two-
character string, "i\x{307}" (that is, "i\N{COMBINING DOT
ABOVE}"), and this is true no matter the locale; see
SpecialCasing.txt.

Here are the respective number of code points in Unicode that
have multichar mappings, the thing that is called "full" case
mapping in Unicode:

       1  lowercase
      42  titlecase
     102  uppercase

It's not really the *number* of code points affected that is the trouble.

Rather, it is Java's complete inability to access them.  It's like having a
"small" race condition.  It's a hole in the spec.  There really is no
reason to support full casing mapping only for uppercase but refuse it for
the other two casings.  This is a very non-parallel situation, and I do not
understand why it exists; it should not.

Sherman, could you please file this bug report so that it gets to the right
queue, and then tell me its bug number?  Maybe this is something that could
be fixed in a future JDK8 project sometime.

--tom

Re: java.lang.Character lacuna #1 of 2

Reply via email to