Re: RFR: 8248655: Support supplementary characters in String case insensitive operations

James Laskey Wed, 15 Jul 2020 12:43:28 -0700

👍

📱


> On Jul 15, 2020, at 4:32 PM, Joe Wang <[email protected]> wrote:
> 
> Jim: I was referring to the rest of the process (the calls to 
> toCodePoint/toUpperCase/toLowerCase), not isHighSurrogate itself. But Roger 
> has a more comprehensive review on performance, and Naoto is planning on 
> preparing a JMH test.
> 
> -Joe
> 
>> On 7/15/2020 11:46 AM, Jim Laskey wrote:
>> Joe: This is a defensive approach that I believe has minimal cost.
>> 
>>     public static boolean isHighSurrogate(char ch) {
>>         // Help VM constant-fold; MAX_HIGH_SURROGATE + 1 == MIN_LOW_SURROGATE
>>         return ch >= MIN_HIGH_SURROGATE && ch < (MAX_HIGH_SURROGATE + 1);
>>     }
>> 
>> 
>>>> On Jul 15, 2020, at 3:32 PM, [email protected] wrote:
>>> 
>>> Hi Joe,
>>> 
>>> Thank you for your review.
>>> 
>>> On 7/15/20 10:57 AM, Joe Wang wrote:
>>>> Hi Naoto,
>>>> In StringUTF16.java, if one is isHighSurrogate and the other not, you may 
>>>> quickly return without going through the rest of the process, probably not 
>>>> significant as cp1 and cp2 and/or u1 and u2 won't be equal anyways. But it 
>>>> could skip a couple of toCodePoint/toUpperCase/toLowerCase calls.
>>> Yes, that is correct as of now, which is based on the assumption that case 
>>> mappings do not cross BMP and supplementary planes boundary. I could not 
>>> find any description where that's given or not. So I just took it to be 
>>> safe.
>>> 
>>> Naoto
>>> 
>>>> -Joe
>>>> On 7/15/20 9:00 AM, [email protected] wrote:
>>>>> Hello,
>>>>> 
>>>>> Please review the fix to the following issues:
>>>>> 
>>>>> https://bugs.openjdk.java.net/browse/JDK-8248655
>>>>> https://bugs.openjdk.java.net/browse/JDK-8248434
>>>>> 
>>>>> The proposed changeset and its CSR are located at:
>>>>> 
>>>>> https://cr.openjdk.java.net/~naoto/8248655.8248434/webrev.00/
>>>>> https://bugs.openjdk.java.net/browse/JDK-8248664
>>>>> 
>>>>> A bug was filed against SimpleDateFormat (8248434) where case-insensitive 
>>>>> date format/parse failed in some of the new locales in JDK15. The root 
>>>>> cause was that case-insensitive String.regionMatches() method did not 
>>>>> work with supplementary characters. The problem is that the method's spec 
>>>>> does not expect case mappings of supplementary characters, possibly 
>>>>> because it was overlooked in the first place, JSR 204 - "Unicode 
>>>>> Supplementary Character support". Similar behavior is observed in other 
>>>>> two case-insensitive methods, i.e., compareToIgnoreCase() and 
>>>>> equalsIgnoreCase().
>>>>> 
>>>>> The fix is straightforward to compare strings by code point basis, 
>>>>> instead of code unit (16bit "char") basis. Technically this change will 
>>>>> introduce a backward incompatibility, but I believe it is an 
>>>>> incompatibility to wrong behavior, not true to the meaning of those 
>>>>> methods' expectations.
>>>>> 
>>>>> Naoto
>

Re: RFR: 8248655: Support supplementary characters in String case insensitive operations

Reply via email to