On Fri, 4 Mar 2022 17:44:44 GMT, Ludovic Henry <luhe...@openjdk.org> wrote:
>> Despite the hash value being cached for Strings, computing the hash still >> represents a significant CPU usage for applications handling lots of text. >> >> Even though it would be generally better to do it through an enhancement to >> the autovectorizer, the complexity of doing it by hand is trivial and the >> gain is sizable (2x speedup) even without the Vector API. The algorithm has >> been proposed by Richard Startin and Paul Sandoz [1]. >> >> Speedup are as follows on a `Intel(R) Xeon(R) E-2276G CPU @ 3.80GHz` >> >> >> Benchmark (size) Mode Cnt Score >> Error Units >> StringHashCode.Algorithm.scalar 0 avgt 1.799 >> ns/op >> StringHashCode.Algorithm.scalar 1 avgt 3.233 >> ns/op >> StringHashCode.Algorithm.scalar 10 avgt 6.617 >> ns/op >> StringHashCode.Algorithm.scalar 100 avgt 61.481 >> ns/op >> StringHashCode.Algorithm.scalar 1000 avgt 631.690 >> ns/op >> StringHashCode.Algorithm.scalar 10000 avgt 6293.611 >> ns/op >> StringHashCode.Algorithm.scalarUnrolled8 0 avgt 1.890 >> ns/op >> StringHashCode.Algorithm.scalarUnrolled8 1 avgt 3.494 >> ns/op >> StringHashCode.Algorithm.scalarUnrolled8 10 avgt 9.050 >> ns/op >> StringHashCode.Algorithm.scalarUnrolled8 100 avgt 31.725 >> ns/op >> StringHashCode.Algorithm.scalarUnrolled8 1000 avgt 321.031 >> ns/op >> StringHashCode.Algorithm.scalarUnrolled8 10000 avgt 3203.838 >> ns/op >> StringHashCode.Algorithm.scalarUnrolled16 0 avgt 1.953 >> ns/op >> StringHashCode.Algorithm.scalarUnrolled16 1 avgt 3.485 >> ns/op >> StringHashCode.Algorithm.scalarUnrolled16 10 avgt 7.041 >> ns/op >> StringHashCode.Algorithm.scalarUnrolled16 100 avgt 30.975 >> ns/op >> StringHashCode.Algorithm.scalarUnrolled16 1000 avgt 316.616 >> ns/op >> StringHashCode.Algorithm.scalarUnrolled16 10000 avgt 3208.658 >> ns/op >> >> >> At Datadog, we handle a great amount of text (through logs management for >> example), and hashing String represents a large part of our CPU usage. It's >> very unlikely that we are the only one as String.hashCode is such a core >> feature of the JVM-based languages with its use in HashMap for example. >> Having even only a 2x speedup would allow us to save thousands of CPU cores >> per month and improve correspondingly the energy/carbon impact. >> >> [1] >> https://static.rainfocus.com/oracle/oow18/sess/1525822677955001tLqU/PF/codeone18-vector-API-DEV5081_1540354883936001Q3Sv.pdf > > Ludovic Henry has updated the pull request incrementally with one additional > commit since the last revision: > > Add UTF-16 benchmarks May you print out the generated code? My wild guess is that the updated version is still scalar, the improvement comes from dependency breakdown. I suggest hoisting the accumulation out of the main loop to achieve maximal scalar throughput. A small experiment on my machine shows improvement over your approach. int c0 = 0, c1 = 0, c2 = 0, c3 = 0; int i = 0, len = value.length; for (; i < (len & (~3)); i += 4) { c3 = c3 * 92351 + (value[i + 3] & 0xff); c2 = c2 * 92351 + (value[i + 2] & 0xff); c1 = c1 * 92351 + (value[i + 1] & 0xff); c0 = c0 * 92351 + (value[i] & 0xff); } int h = c3 * 29791 + c2 * 961 + c1 * 31 + c0; for (; i < len; i++) { h = h * 31 + (value[i] & 0xff); } return h; Thanks a lot. ------------- PR: https://git.openjdk.java.net/jdk/pull/7700