On Fri, 4 Mar 2022 17:01:05 GMT, Roger Riggs <[email protected]> wrote:
>> Despite the hash value being cached for Strings, computing the hash still >> represents a significant CPU usage for applications handling lots of text. >> >> Even though it would be generally better to do it through an enhancement to >> the autovectorizer, the complexity of doing it by hand is trivial and the >> gain is sizable (2x speedup) even without the Vector API. The algorithm has >> been proposed by Richard Startin and Paul Sandoz [1]. >> >> Speedup are as follows on a `Intel(R) Xeon(R) E-2276G CPU @ 3.80GHz` >> >> >> Benchmark (size) Mode Cnt Score >> Error Units >> StringHashCode.Algorithm.scalar 0 avgt 1.799 >> ns/op >> StringHashCode.Algorithm.scalar 1 avgt 3.233 >> ns/op >> StringHashCode.Algorithm.scalar 10 avgt 6.617 >> ns/op >> StringHashCode.Algorithm.scalar 100 avgt 61.481 >> ns/op >> StringHashCode.Algorithm.scalar 1000 avgt 631.690 >> ns/op >> StringHashCode.Algorithm.scalar 10000 avgt 6293.611 >> ns/op >> StringHashCode.Algorithm.scalarUnrolled8 0 avgt 1.890 >> ns/op >> StringHashCode.Algorithm.scalarUnrolled8 1 avgt 3.494 >> ns/op >> StringHashCode.Algorithm.scalarUnrolled8 10 avgt 9.050 >> ns/op >> StringHashCode.Algorithm.scalarUnrolled8 100 avgt 31.725 >> ns/op >> StringHashCode.Algorithm.scalarUnrolled8 1000 avgt 321.031 >> ns/op >> StringHashCode.Algorithm.scalarUnrolled8 10000 avgt 3203.838 >> ns/op >> StringHashCode.Algorithm.scalarUnrolled16 0 avgt 1.953 >> ns/op >> StringHashCode.Algorithm.scalarUnrolled16 1 avgt 3.485 >> ns/op >> StringHashCode.Algorithm.scalarUnrolled16 10 avgt 7.041 >> ns/op >> StringHashCode.Algorithm.scalarUnrolled16 100 avgt 30.975 >> ns/op >> StringHashCode.Algorithm.scalarUnrolled16 1000 avgt 316.616 >> ns/op >> StringHashCode.Algorithm.scalarUnrolled16 10000 avgt 3208.658 >> ns/op >> >> >> At Datadog, we handle a great amount of text (through logs management for >> example), and hashing String represents a large part of our CPU usage. It's >> very unlikely that we are the only one as String.hashCode is such a core >> feature of the JVM-based languages with its use in HashMap for example. >> Having even only a 2x speedup would allow us to save thousands of CPU cores >> per month and improve correspondingly the energy/carbon impact. >> >> [1] >> https://static.rainfocus.com/oracle/oow18/sess/1525822677955001tLqU/PF/codeone18-vector-API-DEV5081_1540354883936001Q3Sv.pdf > > This straight forward enough. But I wonder if this should be a hotspot > intrinsic and be able to take advantage vector machine instructions. > I'd also expect there is an existing test that checks the value of > String.hashCode. @RogerRiggs I would be happy to do such work. As it would be a bigger change, are you suggesting that it could be done or that it should be done as an intrinsic? @cl4es adding that right now. ------------- PR: https://git.openjdk.java.net/jdk/pull/7700
