On Sat, 2 Dec 2023 16:56:22 GMT, Francesco Nigro <[email protected]> wrote:
> This improvement has been found on > https://github.com/vert-x3/vertx-web/pull/2526. > > It can potentially affect the existing ArraysSupport.mismatch caller > code-path performance ie requires investigation. @schlosna Running `TEST="micro:java.lang.StringComparisons.regionMatches"` on AMD [email protected] GHz with tuned network-latency profile on and turbo-boost disabled using `numactl --localalloc -N 0` to avoid weird NUMA-like effects on heap objects. baseline at 25f9af99be1c906fc85b8192df8fa50cced3474f: Benchmark (size) (utf16) Mode Cnt Score Error Units StringComparisons.regionMatches 6 true avgt 5 4.380 ? 0.030 ns/op StringComparisons.regionMatches 6 false avgt 5 5.772 ? 0.056 ns/op StringComparisons.regionMatches 15 true avgt 5 4.005 ? 0.104 ns/op StringComparisons.regionMatches 15 false avgt 5 4.030 ? 0.055 ns/op StringComparisons.regionMatches 1024 true avgt 5 30.037 ? 0.089 ns/op StringComparisons.regionMatches 1024 false avgt 5 17.734 ? 0.092 ns/op StringComparisons.regionMatchesRange 6 true avgt 5 4.825 ? 0.067 ns/op StringComparisons.regionMatchesRange 6 false avgt 5 5.878 ? 0.056 ns/op StringComparisons.regionMatchesRange 15 true avgt 5 5.736 ? 0.069 ns/op StringComparisons.regionMatchesRange 15 false avgt 5 5.447 ? 0.028 ns/op StringComparisons.regionMatchesRange 1024 true avgt 5 31.169 ? 0.009 ns/op StringComparisons.regionMatchesRange 1024 false avgt 5 16.614 ? 0.168 ns/op With 1bd619a5bd2faa8057cb85105b2c9b4997fbf2ac : Benchmark (size) (utf16) Mode Cnt Score Error Units StringComparisons.regionMatches 6 true avgt 5 3.535 ? 0.022 ns/op StringComparisons.regionMatches 6 false avgt 5 3.134 ? 0.022 ns/op StringComparisons.regionMatches 15 true avgt 5 2.568 ? 0.022 ns/op StringComparisons.regionMatches 15 false avgt 5 3.415 ? 0.017 ns/op StringComparisons.regionMatches 1024 true avgt 5 30.052 ? 0.070 ns/op StringComparisons.regionMatches 1024 false avgt 5 17.024 ? 0.111 ns/op StringComparisons.regionMatchesRange 6 true avgt 5 4.819 ? 0.010 ns/op StringComparisons.regionMatchesRange 6 false avgt 5 5.888 ? 0.083 ns/op StringComparisons.regionMatchesRange 15 true avgt 5 5.849 ? 0.106 ns/op StringComparisons.regionMatchesRange 15 false avgt 5 5.466 ? 0.069 ns/op StringComparisons.regionMatchesRange 1024 true avgt 5 31.177 ? 0.015 ns/op StringComparisons.regionMatchesRange 1024 false avgt 5 16.872 ? 0.387 ns/op Which translate in a ~1.8 speedup for small sized ones (which is still a fairly common use case), while bigger ones seems unchanged. I'm adding some better benchmark to show the positive test case improvement as well. The new commit, introducing the full positive use case (maybe relevant for the case with few characters) adds an additional comparison vs the `String::equals` case (which will likely perform a bare minimum amount of checks, if compared to region matches). baseline at 25f9af99be1c906fc85b8192df8fa50cced3474f: Benchmark (size) (utf16) Mode Cnt Score Error Units StringComparisons.same 6 true avgt 5 2.402 ? 0.028 ns/op StringComparisons.same 6 false avgt 5 2.056 ? 0.056 ns/op StringComparisons.same 15 true avgt 5 3.733 ? 0.161 ns/op StringComparisons.same 15 false avgt 5 2.807 ? 0.214 ns/op StringComparisons.same 1024 true avgt 5 23.485 ? 0.150 ns/op StringComparisons.same 1024 false avgt 5 15.302 ? 0.232 ns/op StringComparisons.sameRegionMatches 6 true avgt 5 4.410 ? 0.078 ns/op StringComparisons.sameRegionMatches 6 false avgt 5 5.414 ? 0.028 ns/op StringComparisons.sameRegionMatches 15 true avgt 5 5.770 ? 0.021 ns/op StringComparisons.sameRegionMatches 15 false avgt 5 5.771 ? 0.035 ns/op StringComparisons.sameRegionMatches 1024 true avgt 5 30.964 ? 0.023 ns/op StringComparisons.sameRegionMatches 1024 false avgt 5 16.807 ? 0.181 ns/op with 1bd619a5bd2faa8057cb85105b2c9b4997fbf2ac: Benchmark (size) (utf16) Mode Cnt Score Error Units StringComparisons.sameRegionMatches 6 true avgt 5 3.442 ? 0.016 ns/op StringComparisons.sameRegionMatches 6 false avgt 5 3.117 ? 0.002 ns/op StringComparisons.sameRegionMatches 15 true avgt 5 4.759 ? 0.075 ns/op StringComparisons.sameRegionMatches 15 false avgt 5 3.813 ? 0.026 ns/op StringComparisons.sameRegionMatches 1024 true avgt 5 28.308 ? 0.058 ns/op StringComparisons.sameRegionMatches 1024 false avgt 5 16.774 ? 0.220 ns/op As confirmed by the previous results, the better value of this PR is with small-sized strings, but it is yet to be verified if: - due to a better tail-handing of the `equals` intrinsics - larger strings are limited by the amount of cache activity Running `perfnorm` against 25f9af99be1c906fc85b8192df8fa50cced3474f with `same` vs `sameRegionMatches` reveal that both have high IPC with an high number of branchs and L1 cache-loads (with nearly no misses), which means that both dominate the cost of computation despite the number of instructions on `sameRegionMatches` is higher. In short, with bigger sized-strings, the difference between the 2 intrinsic just fade away, making who perform less instructions, to perform better. @cl4es I can undraft this, but I have no powers to create a JDK issue myself :"/ Keep it alive! ------------- PR Comment: https://git.openjdk.org/jdk/pull/16933#issuecomment-1838571952 PR Comment: https://git.openjdk.org/jdk/pull/16933#issuecomment-1838744137 PR Comment: https://git.openjdk.org/jdk/pull/16933#issuecomment-1838950133 PR Comment: https://git.openjdk.org/jdk/pull/16933#issuecomment-2048905394
