Re: RFR: 8248655: Support supplementary characters in String case insensitive operations

Jim Laskey Wed, 15 Jul 2020 11:47:34 -0700

Joe: This is a defensive approach that I believe has minimal cost.

    public static boolean isHighSurrogate(char ch) {
        // Help VM constant-fold; MAX_HIGH_SURROGATE + 1 == MIN_LOW_SURROGATE
        return ch >= MIN_HIGH_SURROGATE && ch < (MAX_HIGH_SURROGATE + 1);
    }



> On Jul 15, 2020, at 3:32 PM, naoto.s...@oracle.com wrote:
> 
> Hi Joe,
> 
> Thank you for your review.
> 
> On 7/15/20 10:57 AM, Joe Wang wrote:
>> Hi Naoto,
>> In StringUTF16.java, if one is isHighSurrogate and the other not, you may 
>> quickly return without going through the rest of the process, probably not 
>> significant as cp1 and cp2 and/or u1 and u2 won't be equal anyways. But it 
>> could skip a couple of toCodePoint/toUpperCase/toLowerCase calls.
> 
> Yes, that is correct as of now, which is based on the assumption that case 
> mappings do not cross BMP and supplementary planes boundary. I could not find 
> any description where that's given or not. So I just took it to be safe.
> 
> Naoto
> 
>> -Joe
>> On 7/15/20 9:00 AM, naoto.s...@oracle.com wrote:
>>> Hello,
>>> 
>>> Please review the fix to the following issues:
>>> 
>>> https://bugs.openjdk.java.net/browse/JDK-8248655
>>> https://bugs.openjdk.java.net/browse/JDK-8248434
>>> 
>>> The proposed changeset and its CSR are located at:
>>> 
>>> https://cr.openjdk.java.net/~naoto/8248655.8248434/webrev.00/
>>> https://bugs.openjdk.java.net/browse/JDK-8248664
>>> 
>>> A bug was filed against SimpleDateFormat (8248434) where case-insensitive 
>>> date format/parse failed in some of the new locales in JDK15. The root 
>>> cause was that case-insensitive String.regionMatches() method did not work 
>>> with supplementary characters. The problem is that the method's spec does 
>>> not expect case mappings of supplementary characters, possibly because it 
>>> was overlooked in the first place, JSR 204 - "Unicode Supplementary 
>>> Character support". Similar behavior is observed in other two 
>>> case-insensitive methods, i.e., compareToIgnoreCase() and 
>>> equalsIgnoreCase().
>>> 
>>> The fix is straightforward to compare strings by code point basis, instead 
>>> of code unit (16bit "char") basis. Technically this change will introduce a 
>>> backward incompatibility, but I believe it is an incompatibility to wrong 
>>> behavior, not true to the meaning of those methods' expectations.
>>> 
>>> Naoto

Re: RFR: 8248655: Support supplementary characters in String case insensitive operations

Reply via email to