> Despite the hash value being cached for Strings, computing the hash still > represents a significant CPU usage for applications handling lots of text. > > Even though it would be generally better to do it through an enhancement to > the autovectorizer, the complexity of doing it by hand is trivial and the > gain is sizable (2x speedup) even without the Vector API. The algorithm has > been proposed by Richard Startin and Paul Sandoz [1]. > > Speedup are as follows on a `Intel(R) Xeon(R) E-2276G CPU @ 3.80GHz` > > > Benchmark (size) Mode Cnt Score > Error Units > StringHashCode.Algorithm.scalarLatin1 0 avgt 25 2.111 > ± 0.210 ns/op > StringHashCode.Algorithm.scalarLatin1 1 avgt 25 3.500 > ± 0.127 ns/op > StringHashCode.Algorithm.scalarLatin1 10 avgt 25 7.001 > ± 0.099 ns/op > StringHashCode.Algorithm.scalarLatin1 100 avgt 25 61.285 > ± 0.444 ns/op > StringHashCode.Algorithm.scalarLatin1 1000 avgt 25 628.995 > ± 0.846 ns/op > StringHashCode.Algorithm.scalarLatin1 10000 avgt 25 6307.990 > ± 4.071 ns/op > StringHashCode.Algorithm.scalarLatin1Unrolled16 0 avgt 25 2.358 > ± 0.092 ns/op > StringHashCode.Algorithm.scalarLatin1Unrolled16 1 avgt 25 3.631 > ± 0.159 ns/op > StringHashCode.Algorithm.scalarLatin1Unrolled16 10 avgt 25 7.049 > ± 0.019 ns/op > StringHashCode.Algorithm.scalarLatin1Unrolled16 100 avgt 25 33.626 > ± 1.218 ns/op > StringHashCode.Algorithm.scalarLatin1Unrolled16 1000 avgt 25 317.811 > ± 1.225 ns/op > StringHashCode.Algorithm.scalarLatin1Unrolled16 10000 avgt 25 3212.333 > ± 14.621 ns/op > StringHashCode.Algorithm.scalarLatin1Unrolled8 0 avgt 25 2.356 > ± 0.097 ns/op > StringHashCode.Algorithm.scalarLatin1Unrolled8 1 avgt 25 3.630 > ± 0.158 ns/op > StringHashCode.Algorithm.scalarLatin1Unrolled8 10 avgt 25 8.724 > ± 0.065 ns/op > StringHashCode.Algorithm.scalarLatin1Unrolled8 100 avgt 25 32.402 > ± 0.019 ns/op > StringHashCode.Algorithm.scalarLatin1Unrolled8 1000 avgt 25 321.949 > ± 0.251 ns/op > StringHashCode.Algorithm.scalarLatin1Unrolled8 10000 avgt 25 3202.083 > ± 1.667 ns/op > StringHashCode.Algorithm.scalarUTF16 0 avgt 25 2.135 > ± 0.191 ns/op > StringHashCode.Algorithm.scalarUTF16 1 avgt 25 5.202 > ± 0.362 ns/op > StringHashCode.Algorithm.scalarUTF16 10 avgt 25 11.105 > ± 0.112 ns/op > StringHashCode.Algorithm.scalarUTF16 100 avgt 25 75.974 > ± 0.702 ns/op > StringHashCode.Algorithm.scalarUTF16 1000 avgt 25 716.429 > ± 3.290 ns/op > StringHashCode.Algorithm.scalarUTF16 10000 avgt 25 7095.459 > ± 43.847 ns/op > StringHashCode.Algorithm.scalarUTF16Unrolled16 0 avgt 25 2.381 > ± 0.038 ns/op > StringHashCode.Algorithm.scalarUTF16Unrolled16 1 avgt 25 5.268 > ± 0.422 ns/op > StringHashCode.Algorithm.scalarUTF16Unrolled16 10 avgt 25 11.248 > ± 0.178 ns/op > StringHashCode.Algorithm.scalarUTF16Unrolled16 100 avgt 25 52.966 > ± 0.089 ns/op > StringHashCode.Algorithm.scalarUTF16Unrolled16 1000 avgt 25 450.912 > ± 1.834 ns/op > StringHashCode.Algorithm.scalarUTF16Unrolled16 10000 avgt 25 4403.988 > ± 2.927 ns/op > StringHashCode.Algorithm.scalarUTF16Unrolled8 0 avgt 25 2.401 > ± 0.032 ns/op > StringHashCode.Algorithm.scalarUTF16Unrolled8 1 avgt 25 5.091 > ± 0.396 ns/op > StringHashCode.Algorithm.scalarUTF16Unrolled8 10 avgt 25 12.801 > ± 0.189 ns/op > StringHashCode.Algorithm.scalarUTF16Unrolled8 100 avgt 25 52.068 > ± 0.032 ns/op > StringHashCode.Algorithm.scalarUTF16Unrolled8 1000 avgt 25 453.270 > ± 0.340 ns/op > StringHashCode.Algorithm.scalarUTF16Unrolled8 10000 avgt 25 4433.112 > ± 2.699 ns/op > > > At Datadog, we handle a great amount of text (through logs management for > example), and hashing String represents a large part of our CPU usage. It's > very unlikely that we are the only one as String.hashCode is such a core > feature of the JVM-based languages with its use in HashMap for example. > Having even only a 2x speedup would allow us to save thousands of CPU cores > per month and improve correspondingly the energy/carbon impact. > > [1] > https://static.rainfocus.com/oracle/oow18/sess/1525822677955001tLqU/PF/codeone18-vector-API-DEV5081_1540354883936001Q3Sv.pdf
Ludovic Henry has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 18 commits: - Fix overlapping registers - Actually fix h when hashcode is vectorized - Merge branch 'master' of https://github.com/openjdk/jdk into vectorized-stringlatin1-hashcode - Fix h when vectorized for Arrays.hashCode - Add missing check for AryHashCode node - Fix some merge conflicts - Disable Arrays.hashCode intrinsic by default for CI - Merge branch 'master' of https://github.com/openjdk/jdk into vectorized-stringlatin1-hashcode - Some small refactoring: store power_of_31_backwards in the code directly, compact code, and more - {wip} Generalize string hashcode to Arrays.hashCode - ... and 8 more: https://git.openjdk.java.net/jdk/compare/3fa1c404...29dab16b ------------- Changes: https://git.openjdk.java.net/jdk/pull/7700/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=7700&range=11 Stats: 1151 lines in 25 files changed: 1140 ins; 0 del; 11 mod Patch: https://git.openjdk.java.net/jdk/pull/7700.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/7700/head:pull/7700 PR: https://git.openjdk.java.net/jdk/pull/7700