On Tue, 28 Oct 2025 20:15:26 GMT, Xueming Shen <[email protected]> wrote:

>> With the following changes
>> 
>> - Use int[] / code point storage instead of char[] / char for folding data.
>> - Add a fast path for Latin-1 vs. Latin-1 comparisons.
>> - Speed up the UTF-16 path as well.
>> 
>> The only remaining “slow” path is Latin-1 vs. UTF-16 for now.
>> Would appreciate some eagle eyes on the new fast-path implementation.  The 
>> tests suggest it’s working as expected :-)
>> 
>> **StringCompareToFoldCase.latin1UTF16          avgt   15  23.959 ± 2.357  
>> ns/op
>> StringCompareToFoldCase.latin1UTF16EQ        avgt   15  22.455 ± 0.104  ns/op
>> StringCompareToFoldCase.latin1UTF16EQFC      avgt   15  32.782 ± 0.962  ns/op
>> StringCompareToFoldCase.latin1UTF16FC        avgt   15  32.581 ± 0.725  
>> ns/op**
>> 
>> 
>> Benchmark                                    Mode  Cnt   Score   Error  Units
>> StringCompareToFoldCase.asciiLower           avgt   15  17.983 ± 0.208  ns/op
>> StringCompareToFoldCase.asciiLowerEQ         avgt   15   9.986 ± 0.254  ns/op
>> StringCompareToFoldCase.asciiLowerEQFC       avgt   15  10.781 ± 0.161  ns/op
>> StringCompareToFoldCase.asciiLowerFC         avgt   15  10.274 ± 0.136  ns/op
>> StringCompareToFoldCase.asciiUpperLower      avgt   15  12.465 ± 0.409  ns/op
>> StringCompareToFoldCase.asciiUpperLowerEQ    avgt   15  10.961 ± 0.407  ns/op
>> StringCompareToFoldCase.asciiUpperLowerEQFC  avgt   15   9.157 ± 0.166  ns/op
>> StringCompareToFoldCase.asciiUpperLowerFC    avgt   15   9.228 ± 0.254  ns/op
>> StringCompareToFoldCase.asciiWithDFFC        avgt   15  57.603 ± 1.540  ns/op
>> StringCompareToFoldCase.greekLower           avgt   15  39.593 ± 0.128  ns/op
>> StringCompareToFoldCase.greekLowerEQ         avgt   15  39.249 ± 0.032  ns/op
>> StringCompareToFoldCase.greekLowerEQFC       avgt   15  13.993 ± 0.346  ns/op
>> StringCompareToFoldCase.greekLowerFC         avgt   15  14.030 ± 0.454  ns/op
>> StringCompareToFoldCase.greekUpperLower      avgt   15   7.137 ± 0.130  ns/op
>> StringCompareToFoldCase.greekUpperLowerEQ    avgt   15   7.496 ± 0.104  ns/op
>> StringCompareToFoldCase.greekUpperLowerEQFC  avgt   15   6.203 ± 0.316  ns/op
>> StringCompareToFoldCase.greekUpperLowerFC    avgt   15   6.235 ± 0.256  ns/op
>> StringCompareToFoldCase.latin1UTF16          avgt   15  23.959 ± 2.357  ns/op
>> StringCompareToFoldCase.latin1UTF16EQ        avgt   15  22.455 ± 0.104  ns/op
>> StringCompareToFoldCase.latin1UTF16EQFC      avgt   15  32.782 ± 0.962  ns/op
>> StringCompareToFoldCase.latin1UTF16FC        avgt   15  32.581 ± 0.725  ns/op
>> StringCompare...
>
> Experimenting with Arrays.mismatch at the beginning of the array iteration as 
> 
> 
>         int k = ArraysSupport.mismatch(value, other, lim);
>         if (k < 0)
>             return len - olen;
>         for (; k < lim; k++) {
>             ....
>         }
> 
> 
> The benchmark results suggest that it does help 'dramatically' when the 
> compared strings share with the same prefix.  For example those "UpperLower" 
> test cases (which shares the same upper cases text prefix. However it is also 
> relatively expensive, with a 20%-ish overhead when the strings do not share 
> the same string text but are case-insensitively equals.  I would suggest 
> let's leave it out for now?
> 
> ### With Arrays.mismatch
> 
> 
> Benchmark                                    Mode  Cnt   Score   Error  Units
> StringCompareToFoldCase.asciiLower           avgt   15  15.044 ± 0.751  ns/op
> StringCompareToFoldCase.asciiLowerEQ         avgt   15  10.033 ± 0.366  ns/op
> StringCompareToFoldCase.asciiLowerEQFC       avgt   15  12.094 ± 0.288  ns/op
> StringCompareToFoldCase.asciiLowerFC         avgt   15  12.513 ± 0.290  ns/op
> StringCompareToFoldCase.asciiUpperLower      avgt   15  11.716 ± 0.471  ns/op
> StringCompareToFoldCase.asciiUpperLowerEQ    avgt   15  11.120 ± 0.458  ns/op
> StringCompareToFoldCase.asciiUpperLowerEQFC  avgt   15   7.544 ± 0.103  ns/op
> StringCompareToFoldCase.asciiUpperLowerFC    avgt   15   7.384 ± 0.167  ns/op
> StringCompareToFoldCase.asciiWithDFFC        avgt   15  54.949 ± 1.260  ns/op
> StringCompareToFoldCase.greekLower           avgt   15  39.492 ± 0.124  ns/op
> StringCompareToFoldCase.greekLowerEQ         avgt   15  39.266 ± 0.071  ns/op
> StringCompareToFoldCase.greekLowerEQFC       avgt   15  28.049 ± 0.292  ns/op
> StringCompareToFoldCase.greekLowerFC         avgt   15  28.272 ± 0.115  ns/op
> StringCompareToFoldCase.greekUpperLower      avgt   15   7.103 ± 0.052  ns/op
> StringCompareToFoldCase.greekUpperLowerEQ    avgt   15   7.439 ± 0.079  ns/op
> StringCompareToFoldCase.greekUpperLowerEQFC  avgt   15   2.716 ± 0.138  ns/op
> StringCompareToFoldCase.greekUpperLowerFC    avgt   15   2.628 ± 0.051  ns/op
> StringCompareToFoldCase.latin1UTF16          avgt   15  23.147 ± 0.094  ns/op
> StringCompareToFoldCase.latin1UTF16EQ        avgt   15  22.626 ± 0.081  ns/op
> StringCompareToFoldCase.latin1UTF16EQFC      avgt   15  38.453 ± 0.697  ns/op
> StringCompareToFoldCase.latin1UTF16FC        avgt   15  38.464 ± 0.441  ns/op
> StringCompareToFoldCase.supLower             avgt   15  54.443 ± 0.162  ns/op
> StringCompareToFoldCase....

### Long:  packing  1:M-count + 1-3 folding codepoints

https://cr.openjdk.org/~sherman/casefolding_long/

The performance is slightly better, but not as good as I would have expected. 
The access to codepoint from the long looks a little clumsy,  but the logic 
looks smooth. need more work. opinion?


Benchmark                                    Mode  Cnt   Score   Error  Units
StringCompareToFoldCase.asciiLower           avgt   15  15.487 ± 0.298  ns/op
StringCompareToFoldCase.asciiLowerEQ         avgt   15  10.005 ± 0.368  ns/op
StringCompareToFoldCase.asciiLowerEQFC       avgt   15  10.755 ± 0.160  ns/op
StringCompareToFoldCase.asciiLowerFC         avgt   15  10.349 ± 0.155  ns/op
StringCompareToFoldCase.asciiUpperLower      avgt   15  12.188 ± 0.278  ns/op
StringCompareToFoldCase.asciiUpperLowerEQ    avgt   15  10.901 ± 0.551  ns/op
StringCompareToFoldCase.asciiUpperLowerEQFC  avgt   15   9.218 ± 0.165  ns/op
StringCompareToFoldCase.asciiUpperLowerFC    avgt   15   9.335 ± 0.404  ns/op
StringCompareToFoldCase.asciiWithDFFC        avgt   15  37.010 ± 0.518  ns/op
StringCompareToFoldCase.greekLower           avgt   15  39.572 ± 0.098  ns/op
StringCompareToFoldCase.greekLowerEQ         avgt   15  39.317 ± 0.104  ns/op
StringCompareToFoldCase.greekLowerEQFC       avgt   15  20.428 ± 0.243  ns/op
StringCompareToFoldCase.greekLowerFC         avgt   15  19.623 ± 0.141  ns/op
StringCompareToFoldCase.greekUpperLower      avgt   15   7.105 ± 0.048  ns/op
StringCompareToFoldCase.greekUpperLowerEQ    avgt   15   7.462 ± 0.092  ns/op
StringCompareToFoldCase.greekUpperLowerEQFC  avgt   15   6.518 ± 0.128  ns/op
StringCompareToFoldCase.greekUpperLowerFC    avgt   15   6.593 ± 0.240  ns/op
StringCompareToFoldCase.latin1UTF16          avgt   15  23.130 ± 0.152  ns/op
StringCompareToFoldCase.latin1UTF16EQ        avgt   15  22.606 ± 0.089  ns/op
StringCompareToFoldCase.latin1UTF16EQFC      avgt   15  29.574 ± 0.348  ns/op
StringCompareToFoldCase.latin1UTF16FC        avgt   15  29.691 ± 0.445  ns/op
StringCompareToFoldCase.supLower             avgt   15  55.027 ± 0.676  ns/op
StringCompareToFoldCase.supLowerEQ           avgt   15  55.784 ± 0.368  ns/op
StringCompareToFoldCase.supLowerEQFC         avgt   15  24.984 ± 0.157  ns/op
StringCompareToFoldCase.supLowerFC           avgt   15  24.865 ± 0.139  ns/op
StringCompareToFoldCase.supUpperLower        avgt   15  14.538 ± 0.144  ns/op
StringCompareToFoldCase.supUpperLowerEQ      avgt   15  14.728 ± 0.206  ns/op
StringCompareToFoldCase.supUpperLowerEQFC    avgt   15   9.932 ± 0.121  ns/op
StringCompareToFoldCase.supUpperLowerFC      avgt   15   9.870 ± 0.119  ns/op

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/27628#discussion_r2471532292

Reply via email to