Re: RFR: 8285255: refine StringLatin1.regionMatchesCI_UTF16 [v3]

Claes Redestad Mon, 25 Apr 2022 08:13:51 -0700

On Wed, 20 Apr 2022 21:08:19 GMT, XenoAmess <[email protected]> wrote:


>> some thoughts after watching 8285001: Simplify StringLatin1.regionMatches  
>> https://github.com/openjdk/jdk/pull/8292/
>> 
>>             if (Character.toLowerCase(u1) == Character.toLowerCase(u2)) {
>>                 continue;
>>             }
>> 
>> should be changed to 
>> 
>>             if (((u1 == c1) ? CharacterDataLatin1.instance.toLowerCase(c1) : 
>> c1) == Character.toLowerCase(u2)) {
>>                 continue;
>>             }
>> 
>> as:
>> 
>> 1. c1 is LATIN1, so CharacterDataLatin1.instance.toLowerCase seems faster.
>> 2. because c1 is LATIN1, so if u1 != c1, then c1 is already lowercase, and 
>> don't need a lowercase cauculation.
>
> XenoAmess has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   remove = check

Unfortunately this leads to an error for case-insensitive `regionMatches` 
between a latin-1-string that contains either of `\u00b5` or `\u00ff` (these 
two code-points have upper case codepoints outside of the latin-1 range) and a 
UTF-16 string:


jshell> "\u00b5".regionMatches(true, 0, "\u0100", 0, 1)
|  Exception java.lang.ArrayIndexOutOfBoundsException: Index 924 out of bounds 
for length 256
|        at CharacterDataLatin1.getProperties (CharacterDataLatin1.java:74)
|        at CharacterDataLatin1.toLowerCase (CharacterDataLatin1.java:140)
|        at StringLatin1.regionMatchesCI_UTF16 (StringLatin1.java:420)
|        at String.regionMatches (String.java:2238)
|        at (#4:1)

-------------

PR: https://git.openjdk.java.net/jdk/pull/8308

Re: RFR: 8285255: refine StringLatin1.regionMatchesCI_UTF16 [v3]

Reply via email to