On Thu, 7 Apr 2022 07:01:23 GMT, Ludovic Henry <luhe...@openjdk.org> wrote:
>> Despite the hash value being cached for Strings, computing the hash still >> represents a significant CPU usage for applications handling lots of text. >> >> Even though it would be generally better to do it through an enhancement to >> the autovectorizer, the complexity of doing it by hand is trivial and the >> gain is sizable (2x speedup) even without the Vector API. The algorithm has >> been proposed by Richard Startin and Paul Sandoz [1]. >> >> Speedup are as follows on a `Intel(R) Xeon(R) E-2276G CPU @ 3.80GHz` >> >> >> Benchmark (size) Mode Cnt Score >> Error Units >> StringHashCode.Algorithm.scalarLatin1 0 avgt 25 2.111 >> ± 0.210 ns/op >> StringHashCode.Algorithm.scalarLatin1 1 avgt 25 3.500 >> ± 0.127 ns/op >> StringHashCode.Algorithm.scalarLatin1 10 avgt 25 7.001 >> ± 0.099 ns/op >> StringHashCode.Algorithm.scalarLatin1 100 avgt 25 61.285 >> ± 0.444 ns/op >> StringHashCode.Algorithm.scalarLatin1 1000 avgt 25 628.995 >> ± 0.846 ns/op >> StringHashCode.Algorithm.scalarLatin1 10000 avgt 25 6307.990 >> ± 4.071 ns/op >> StringHashCode.Algorithm.scalarLatin1Unrolled16 0 avgt 25 2.358 >> ± 0.092 ns/op >> StringHashCode.Algorithm.scalarLatin1Unrolled16 1 avgt 25 3.631 >> ± 0.159 ns/op >> StringHashCode.Algorithm.scalarLatin1Unrolled16 10 avgt 25 7.049 >> ± 0.019 ns/op >> StringHashCode.Algorithm.scalarLatin1Unrolled16 100 avgt 25 33.626 >> ± 1.218 ns/op >> StringHashCode.Algorithm.scalarLatin1Unrolled16 1000 avgt 25 317.811 >> ± 1.225 ns/op >> StringHashCode.Algorithm.scalarLatin1Unrolled16 10000 avgt 25 3212.333 >> ± 14.621 ns/op >> StringHashCode.Algorithm.scalarLatin1Unrolled8 0 avgt 25 2.356 >> ± 0.097 ns/op >> StringHashCode.Algorithm.scalarLatin1Unrolled8 1 avgt 25 3.630 >> ± 0.158 ns/op >> StringHashCode.Algorithm.scalarLatin1Unrolled8 10 avgt 25 8.724 >> ± 0.065 ns/op >> StringHashCode.Algorithm.scalarLatin1Unrolled8 100 avgt 25 32.402 >> ± 0.019 ns/op >> StringHashCode.Algorithm.scalarLatin1Unrolled8 1000 avgt 25 321.949 >> ± 0.251 ns/op >> StringHashCode.Algorithm.scalarLatin1Unrolled8 10000 avgt 25 3202.083 >> ± 1.667 ns/op >> StringHashCode.Algorithm.scalarUTF16 0 avgt 25 2.135 >> ± 0.191 ns/op >> StringHashCode.Algorithm.scalarUTF16 1 avgt 25 5.202 >> ± 0.362 ns/op >> StringHashCode.Algorithm.scalarUTF16 10 avgt 25 11.105 >> ± 0.112 ns/op >> StringHashCode.Algorithm.scalarUTF16 100 avgt 25 75.974 >> ± 0.702 ns/op >> StringHashCode.Algorithm.scalarUTF16 1000 avgt 25 716.429 >> ± 3.290 ns/op >> StringHashCode.Algorithm.scalarUTF16 10000 avgt 25 7095.459 >> ± 43.847 ns/op >> StringHashCode.Algorithm.scalarUTF16Unrolled16 0 avgt 25 2.381 >> ± 0.038 ns/op >> StringHashCode.Algorithm.scalarUTF16Unrolled16 1 avgt 25 5.268 >> ± 0.422 ns/op >> StringHashCode.Algorithm.scalarUTF16Unrolled16 10 avgt 25 11.248 >> ± 0.178 ns/op >> StringHashCode.Algorithm.scalarUTF16Unrolled16 100 avgt 25 52.966 >> ± 0.089 ns/op >> StringHashCode.Algorithm.scalarUTF16Unrolled16 1000 avgt 25 450.912 >> ± 1.834 ns/op >> StringHashCode.Algorithm.scalarUTF16Unrolled16 10000 avgt 25 4403.988 >> ± 2.927 ns/op >> StringHashCode.Algorithm.scalarUTF16Unrolled8 0 avgt 25 2.401 >> ± 0.032 ns/op >> StringHashCode.Algorithm.scalarUTF16Unrolled8 1 avgt 25 5.091 >> ± 0.396 ns/op >> StringHashCode.Algorithm.scalarUTF16Unrolled8 10 avgt 25 12.801 >> ± 0.189 ns/op >> StringHashCode.Algorithm.scalarUTF16Unrolled8 100 avgt 25 52.068 >> ± 0.032 ns/op >> StringHashCode.Algorithm.scalarUTF16Unrolled8 1000 avgt 25 453.270 >> ± 0.340 ns/op >> StringHashCode.Algorithm.scalarUTF16Unrolled8 10000 avgt 25 4433.112 >> ± 2.699 ns/op >> >> >> At Datadog, we handle a great amount of text (through logs management for >> example), and hashing String represents a large part of our CPU usage. It's >> very unlikely that we are the only one as String.hashCode is such a core >> feature of the JVM-based languages with its use in HashMap for example. >> Having even only a 2x speedup would allow us to save thousands of CPU cores >> per month and improve correspondingly the energy/carbon impact. >> >> [1] >> https://static.rainfocus.com/oracle/oow18/sess/1525822677955001tLqU/PF/codeone18-vector-API-DEV5081_1540354883936001Q3Sv.pdf > > Ludovic Henry has updated the pull request incrementally with one additional > commit since the last revision: > > Fix some merge conflicts Can't really see anything that I think is the direct cause of the error you're seeing, but there are a couple of places where `Op_AryHashCode` appears to be missing. src/hotspot/share/adlc/formssel.cpp line 903: > 901: strcmp(_matrule->_rChild->_opType,"StrComp" )==0 || > 902: strcmp(_matrule->_rChild->_opType,"StrEquals" )==0 || > 903: strcmp(_matrule->_rChild->_opType,"StrHashCode")==0 || Should you add `AryHashCode` here, too? src/hotspot/share/opto/loopnode.cpp line 5784: > 5782: case Op_StrComp: // Does a bunch of load-like effects > 5783: case Op_StrEquals: > 5784: case Op_StrHashCode: Seems reasonably that you need a `case Op_AryHashCode` here, too. ------------- PR: https://git.openjdk.java.net/jdk/pull/7700