Re: RFR: 8302871: Speed up StringLatin1.regionMatchesCI

David Schlosnagle Mon, 20 Feb 2023 05:19:57 -0800

On Sat, 18 Feb 2023 09:21:25 GMT, Eirik Bjorsnos <[email protected]> wrote:


> This PR suggests we can speed up `StringLatin1.regionMatchesCI` by applying 
> 'the oldest ASCII trick in the book'.
> 
> The new static method `CharacterDataLatin1.equalsIgnoreCase` compares two 
> latin1 bytes for equality ignoring case. `StringLatin1.regionMatchesCI` is 
> updated to use `equalsIgnoreCase`
> 
> To verify the correctness of `equalsIgnoreCase`, a new test is added  to 
> `EqualsIgnoreCase` with an exhaustive verification that all 256x256 latin1 
> code point pairs have an `equalsIgnoreCase` consistent with 
> Character.toUpperCase, Character.toLowerCase.
> 
> Performance is tested for matching and mismatching cases of code point pairs 
> picked from the ASCII letter, ASCII number and latin1 letter ranges. Results 
> in the first comment below.

src/java.base/share/classes/java/lang/CharacterDataLatin1.java.template line 
181:

> 179:          return ( U <= 'Z' // In range A-Z
> 180:                  || (U >= 0xC0 && U <= 0XDE && U != 0xD7)) // ..or 
> A-grave-Thorn, excl. multiplication
> 181:                  && U == (b2 & 0xDF); // b2 has same uppercase

I'm curious if the order of comparisons could alter performance to a small 
degree. For example, it might be interesting to compare various permutations 
like below to short circuit reject unequal uppercased b2

Suggestion:

         // uppercase b1 using 'the oldest ASCII trick in the book'
         int U = b1 & 0xDF;
         return (U == (b2 & 0xDF))
             && ((U >= 'A' && U <= 'Z') // In range A-Z
                 || (U >= 0xC0 && U <= 0XDE && U != 0xD7)) // ..or 
A-grave-Thorn, excl. multiplication

-------------

PR: https://git.openjdk.org/jdk/pull/12632

Re: RFR: 8302871: Speed up StringLatin1.regionMatchesCI

Reply via email to