Re: RFR: 8321283: Reuse StringLatin1::equals in regionMatches

Francesco Nigro Wed, 26 Nov 2025 12:14:57 -0800

On Sat, 2 Dec 2023 16:56:22 GMT, Francesco Nigro <[email protected]> wrote:


> This improvement has been found on 
> https://github.com/vert-x3/vertx-web/pull/2526.
> 
> It can potentially affect the existing ArraysSupport.mismatch caller 
> code-path performance ie requires investigation.

@schlosna 

Running `TEST="micro:java.lang.StringComparisons.regionMatches"` on AMD 
[email protected] GHz with tuned network-latency profile on and turbo-boost disabled 
using `numactl --localalloc -N 0` to avoid weird NUMA-like effects on heap 
objects.

baseline at 25f9af99be1c906fc85b8192df8fa50cced3474f:

Benchmark                             (size)  (utf16)  Mode  Cnt    Score   
Error  Units
StringComparisons.regionMatches            6     true  avgt    5    4.380 ? 
0.030  ns/op
StringComparisons.regionMatches            6    false  avgt    5    5.772 ? 
0.056  ns/op
StringComparisons.regionMatches           15     true  avgt    5    4.005 ? 
0.104  ns/op
StringComparisons.regionMatches           15    false  avgt    5    4.030 ? 
0.055  ns/op
StringComparisons.regionMatches         1024     true  avgt    5   30.037 ? 
0.089  ns/op
StringComparisons.regionMatches         1024    false  avgt    5   17.734 ? 
0.092  ns/op
StringComparisons.regionMatchesRange       6     true  avgt    5    4.825 ? 
0.067  ns/op
StringComparisons.regionMatchesRange       6    false  avgt    5    5.878 ? 
0.056  ns/op
StringComparisons.regionMatchesRange      15     true  avgt    5    5.736 ? 
0.069  ns/op
StringComparisons.regionMatchesRange      15    false  avgt    5    5.447 ? 
0.028  ns/op
StringComparisons.regionMatchesRange    1024     true  avgt    5   31.169 ? 
0.009  ns/op
StringComparisons.regionMatchesRange    1024    false  avgt    5   16.614 ? 
0.168  ns/op

With 1bd619a5bd2faa8057cb85105b2c9b4997fbf2ac :

Benchmark                             (size)  (utf16)  Mode  Cnt    Score   
Error  Units
StringComparisons.regionMatches            6     true  avgt    5    3.535 ? 
0.022  ns/op
StringComparisons.regionMatches            6    false  avgt    5    3.134 ? 
0.022  ns/op
StringComparisons.regionMatches           15     true  avgt    5    2.568 ? 
0.022  ns/op
StringComparisons.regionMatches           15    false  avgt    5    3.415 ? 
0.017  ns/op
StringComparisons.regionMatches         1024     true  avgt    5   30.052 ? 
0.070  ns/op
StringComparisons.regionMatches         1024    false  avgt    5   17.024 ? 
0.111  ns/op
StringComparisons.regionMatchesRange       6     true  avgt    5    4.819 ? 
0.010  ns/op
StringComparisons.regionMatchesRange       6    false  avgt    5    5.888 ? 
0.083  ns/op
StringComparisons.regionMatchesRange      15     true  avgt    5    5.849 ? 
0.106  ns/op
StringComparisons.regionMatchesRange      15    false  avgt    5    5.466 ? 
0.069  ns/op
StringComparisons.regionMatchesRange    1024     true  avgt    5   31.177 ? 
0.015  ns/op
StringComparisons.regionMatchesRange    1024    false  avgt    5   16.872 ? 
0.387  ns/op

Which translate in a ~1.8 speedup for small sized ones (which is still a fairly 
common use case), while bigger ones seems unchanged. 
I'm adding some better benchmark to show the positive test case improvement as 
well.

The new commit, introducing the full positive use case (maybe relevant for the 
case with few characters) adds
an additional comparison vs the `String::equals` case (which will likely 
perform a bare minimum amount of checks, if compared to region matches).

baseline at 25f9af99be1c906fc85b8192df8fa50cced3474f:

Benchmark                            (size)  (utf16)  Mode  Cnt   Score   Error 
 Units
StringComparisons.same                    6     true  avgt    5   2.402 ? 0.028 
 ns/op
StringComparisons.same                    6    false  avgt    5   2.056 ? 0.056 
 ns/op
StringComparisons.same                   15     true  avgt    5   3.733 ? 0.161 
 ns/op
StringComparisons.same                   15    false  avgt    5   2.807 ? 0.214 
 ns/op
StringComparisons.same                 1024     true  avgt    5  23.485 ? 0.150 
 ns/op
StringComparisons.same                 1024    false  avgt    5  15.302 ? 0.232 
 ns/op

StringComparisons.sameRegionMatches       6     true  avgt    5   4.410 ? 0.078 
 ns/op
StringComparisons.sameRegionMatches       6    false  avgt    5   5.414 ? 0.028 
 ns/op
StringComparisons.sameRegionMatches      15     true  avgt    5   5.770 ? 0.021 
 ns/op
StringComparisons.sameRegionMatches      15    false  avgt    5   5.771 ? 0.035 
 ns/op
StringComparisons.sameRegionMatches    1024     true  avgt    5  30.964 ? 0.023 
 ns/op
StringComparisons.sameRegionMatches    1024    false  avgt    5  16.807 ? 0.181 
 ns/op

with 1bd619a5bd2faa8057cb85105b2c9b4997fbf2ac:

Benchmark                            (size)  (utf16)  Mode  Cnt   Score   Error 
 Units
StringComparisons.sameRegionMatches       6     true  avgt    5   3.442 ? 0.016 
 ns/op
StringComparisons.sameRegionMatches       6    false  avgt    5   3.117 ? 0.002 
 ns/op
StringComparisons.sameRegionMatches      15     true  avgt    5   4.759 ? 0.075 
 ns/op
StringComparisons.sameRegionMatches      15    false  avgt    5   3.813 ? 0.026 
 ns/op
StringComparisons.sameRegionMatches    1024     true  avgt    5  28.308 ? 0.058 
 ns/op
StringComparisons.sameRegionMatches    1024    false  avgt    5  16.774 ? 0.220 
 ns/op

As confirmed by the previous results, the better value of this PR is with 
small-sized strings, but it is yet to be verified if:
- due to a better tail-handing of the `equals` intrinsics
- larger strings are limited by the amount of cache activity

Running `perfnorm` against 25f9af99be1c906fc85b8192df8fa50cced3474f with `same` 
vs `sameRegionMatches` reveal that 
both have high IPC with an high number of branchs and L1 cache-loads (with 
nearly no misses), which means that both dominate the cost of computation 
despite the number of instructions on `sameRegionMatches` is higher.
In short, with bigger sized-strings, the difference between the 2 intrinsic 
just fade away, making who perform less instructions, to perform better.

@cl4es I can undraft this, but I have no powers to create a JDK issue myself :"/

Keep it alive!

-------------

PR Comment: https://git.openjdk.org/jdk/pull/16933#issuecomment-1838571952
PR Comment: https://git.openjdk.org/jdk/pull/16933#issuecomment-1838744137
PR Comment: https://git.openjdk.org/jdk/pull/16933#issuecomment-1838950133
PR Comment: https://git.openjdk.org/jdk/pull/16933#issuecomment-2048905394

Re: RFR: 8321283: Reuse StringLatin1::equals in regionMatches

Reply via email to