Hi,

Tobias weighed in on this in another thread[1], and while he thinks the
proposed patch is semantically correct, concerns was raised that maybe
the UTF16 intrinsics could be superior (on some platforms).

I ran the microbenchmark below as well as existing string-density-
benchmark[2] suite, noting no statistically significant differences for
peak performance on x64_86 (Windows, Linux, Mac) and SPARC T4 through
M7. Warmup improvements are similar across all platforms.

So from our point of view things look green.

However, I have no means of testing the intrinsics on other platforms
(S390, aarch64, ppc), so it'd be much appreciated if performance could
be verified on those platforms using the proposed patch and benchmark.

By inspecting code it seems the difference should be negligible or even
positive, e.g., on aarch64 there's a trailing comparison that is elided
when treating the byte[] as a char[] - overhead that is possibly offset
entirely by removing an extra branch before going into the intrinsics.

Thanks!

/Claes

[1] http://mail.openjdk.java.net/pipermail/core-libs-dev/2018-December/057240.html [2] http://cr.openjdk.java.net/~shade/density/string-density-bench.zip (had to remove hg maven plugin, reference to sun.misc.Version and update JMH version for this to build and run on latest JDK)

On 2018-12-08 01:11, Claes Redestad wrote:
Hi,

following up from discussions during review of JDK-8214971[1], I
examined the startup and peak performance of a few different variant of
writing String::equals.

Webrev: http://cr.openjdk.java.net/~redestad/8215017/jdk.00/
Bug: https://bugs.openjdk.java.net/browse/JDK-8215017

- folding coder() == aString.coder() into sameCoder(aString) helps
interpreter without adversely affecting higher optimization levels

- Jim's proposal to use Arrays.equals is _interesting_: it improves
peak performance on some inputs but regress it on others. I'll defer
that to a future RFE as it needs a more thorough examination.

- what we can do is simplify to only use StringLatin1.equals. If I'm not
completely mistaken these are all semantically neutral (and
StringUTF16.equals effectively redundant). If I'm completely wrong here
I'll welcome the learning opportunity. :-)

This removes a branch and two method calls, and for UTF16 Strings we'll
use a simpler algorithm early, which turns out to be beneficial during
interpreter and C1 level.

I added a simple microbenchmark to explore this, results show 1.2-2.5x
improvements in interpreter performance, while remaining perfectly
neutral results for optimized code on this simple micro[2].

This could be extended to clean up and move StringLatin1.equals back
into String and remove StringUTF16, but we'd also need to rearrange the
intrinsics on the VM side. Let me know what you think.

Thanks!

/Claes

[1] http://mail.openjdk.java.net/pipermail/core-libs-dev/2018-December/057162.html

[2]
========== Baseline =================

-Xint
Benchmark                            Mode  Cnt     Score    Error  Units
StringEquals.equalsAlmostEqual       avgt    4   968.640 ±  1.337  ns/op
StringEquals.equalsAlmostEqualUTF16  avgt    4  2082.007 ±  5.303  ns/op
StringEquals.equalsDifferent         avgt    4   583.166 ± 29.461  ns/op
StringEquals.equalsDifferentCoders   avgt    4   422.993 ±  1.291  ns/op
StringEquals.equalsEqual             avgt    4   988.671 ±  1.492  ns/op
StringEquals.equalsEqualsUTF16       avgt    4  2103.060 ±  5.705  ns/op

-XX:+CompactStrings
Benchmark                            Mode  Cnt   Score   Error  Units
StringEquals.equalsAlmostEqual       avgt    4  23.896 ± 0.089  ns/op
StringEquals.equalsAlmostEqualUTF16  avgt    4  23.935 ± 0.562  ns/op
StringEquals.equalsDifferent         avgt    4  15.086 ± 0.044  ns/op
StringEquals.equalsDifferentCoders   avgt    4  12.572 ± 0.008  ns/op
StringEquals.equalsEqual             avgt    4  25.143 ± 0.025  ns/op
StringEquals.equalsEqualsUTF16       avgt    4  25.148 ± 0.021  ns/op

-XX:-CompactStrings
Benchmark                            Mode  Cnt   Score   Error  Units
StringEquals.equalsAlmostEqual       avgt    4  24.539 ± 0.127  ns/op
StringEquals.equalsAlmostEqualUTF16  avgt    4  22.638 ± 0.047  ns/op
StringEquals.equalsDifferent         avgt    4  13.930 ± 0.835  ns/op
StringEquals.equalsDifferentCoders   avgt    4  13.836 ± 0.025  ns/op
StringEquals.equalsEqual             avgt    4  26.420 ± 0.020  ns/op
StringEquals.equalsEqualsUTF16       avgt    4  23.889 ± 0.037  ns/op

========== Fix ======================

-Xint
Benchmark                            Mode  Cnt    Score     Error  Units
StringEquals.equalsAlmostEqual       avgt    4  811.859 ±   8.663  ns/op
StringEquals.equalsAlmostEqualUTF16  avgt    4  802.784 ± 352.884  ns/op
StringEquals.equalsDifferent         avgt    4  431.837 ±   1.884  ns/op
StringEquals.equalsDifferentCoders   avgt    4  358.244 ±   1.208  ns/op
StringEquals.equalsEqual             avgt    4  832.056 ±   3.541  ns/op
StringEquals.equalsEqualsUTF16       avgt    4  832.434 ±   3.516  ns/op

-XX:+CompactStrings
Benchmark                            Mode  Cnt   Score   Error  Units
StringEquals.equalsAlmostEqual       avgt    4  23.906 ± 0.151  ns/op
StringEquals.equalsAlmostEqualUTF16  avgt    4  23.905 ± 0.123  ns/op
StringEquals.equalsDifferent         avgt    4  15.088 ± 0.023  ns/op
StringEquals.equalsDifferentCoders   avgt    4  12.575 ± 0.030  ns/op
StringEquals.equalsEqual             avgt    4  25.149 ± 0.059  ns/op
StringEquals.equalsEqualsUTF16       avgt    4  25.149 ± 0.033  ns/op

-XX:-CompactStrings
Benchmark                            Mode  Cnt   Score   Error  Units
StringEquals.equalsAlmostEqual       avgt    4  24.521 ± 0.050  ns/op
StringEquals.equalsAlmostEqualUTF16  avgt    4  22.639 ± 0.035  ns/op
StringEquals.equalsDifferent         avgt    4  13.831 ± 0.020  ns/op
StringEquals.equalsDifferentCoders   avgt    4  13.884 ± 0.345  ns/op
StringEquals.equalsEqual             avgt    4  26.395 ± 0.066  ns/op
StringEquals.equalsEqualsUTF16       avgt    4  23.904 ± 0.112  ns/op

Reply via email to