On Sun, 20 Dec 2020 20:27:03 GMT, Claes Redestad <redes...@openjdk.org> wrote:
> - The MD5 intrinsics added by > [JDK-8250902](https://bugs.openjdk.java.net/browse/JDK-8250902) shows that > the `int[] x` isn't actually needed. This also applies to the SHA intrinsics > from which the MD5 intrinsic takes inspiration > - Using VarHandles we can simplify the code in `ByteArrayAccess` enough to > make it acceptable to use inline and replace the array in MD5 wholesale. This > improves performance both in the presence and the absence of the intrinsic > optimization. > - Doing the exact same thing in the SHA impls would be unwieldy (64+ element > arrays), but allocating the array lazily gets most of the speed-up in the > presence of an intrinsic while being neutral in its absence. > > Baseline: > (digesterName) (length) Cnt Score > Error Units > MessageDigests.digest MD5 16 15 > 2714.307 ± 21.133 ops/ms > MessageDigests.digest MD5 1024 15 > 318.087 ± 0.637 ops/ms > MessageDigests.digest SHA-1 16 15 > 1387.266 ± 40.932 ops/ms > MessageDigests.digest SHA-1 1024 15 > 109.273 ± 0.149 ops/ms > MessageDigests.digest SHA-256 16 15 > 995.566 ± 21.186 ops/ms > MessageDigests.digest SHA-256 1024 15 > 89.104 ± 0.079 ops/ms > MessageDigests.digest SHA-512 16 15 > 803.030 ± 15.722 ops/ms > MessageDigests.digest SHA-512 1024 15 > 115.611 ± 0.234 ops/ms > MessageDigests.getAndDigest MD5 16 15 > 2190.367 ± 97.037 ops/ms > MessageDigests.getAndDigest MD5 1024 15 > 302.903 ± 1.809 ops/ms > MessageDigests.getAndDigest SHA-1 16 15 > 1262.656 ± 43.751 ops/ms > MessageDigests.getAndDigest SHA-1 1024 15 > 104.889 ± 3.554 ops/ms > MessageDigests.getAndDigest SHA-256 16 15 > 914.541 ± 55.621 ops/ms > MessageDigests.getAndDigest SHA-256 1024 15 > 85.708 ± 1.394 ops/ms > MessageDigests.getAndDigest SHA-512 16 15 > 737.719 ± 53.671 ops/ms > MessageDigests.getAndDigest SHA-512 1024 15 > 112.307 ± 1.950 ops/ms > > GC: > MessageDigests.getAndDigest:·gc.alloc.rate.norm MD5 16 15 > 312.011 ± 0.005 B/op > MessageDigests.getAndDigest:·gc.alloc.rate.norm SHA-1 16 15 > 584.020 ± 0.006 B/op > MessageDigests.getAndDigest:·gc.alloc.rate.norm SHA-256 16 15 > 544.019 ± 0.016 B/op > MessageDigests.getAndDigest:·gc.alloc.rate.norm SHA-512 16 15 > 1056.037 ± 0.003 B/op > > Target: > Benchmark (digesterName) (length) Cnt > Score Error Units > MessageDigests.digest MD5 16 15 > 3134.462 ± 43.685 ops/ms > MessageDigests.digest MD5 1024 15 > 323.667 ± 0.633 ops/ms > MessageDigests.digest SHA-1 16 15 > 1418.742 ± 38.223 ops/ms > MessageDigests.digest SHA-1 1024 15 > 110.178 ± 0.788 ops/ms > MessageDigests.digest SHA-256 16 15 > 1037.949 ± 21.214 ops/ms > MessageDigests.digest SHA-256 1024 15 > 89.671 ± 0.228 ops/ms > MessageDigests.digest SHA-512 16 15 > 812.028 ± 39.489 ops/ms > MessageDigests.digest SHA-512 1024 15 > 116.738 ± 0.249 ops/ms > MessageDigests.getAndDigest MD5 16 15 > 2314.379 ± 229.294 ops/ms > MessageDigests.getAndDigest MD5 1024 15 > 307.835 ± 5.730 ops/ms > MessageDigests.getAndDigest SHA-1 16 15 > 1326.887 ± 63.263 ops/ms > MessageDigests.getAndDigest SHA-1 1024 15 > 106.611 ± 2.292 ops/ms > MessageDigests.getAndDigest SHA-256 16 15 > 961.589 ± 82.052 ops/ms > MessageDigests.getAndDigest SHA-256 1024 15 > 88.646 ± 0.194 ops/ms > MessageDigests.getAndDigest SHA-512 16 15 > 775.417 ± 56.775 ops/ms > MessageDigests.getAndDigest SHA-512 1024 15 > 112.904 ± 2.014 ops/ms > > GC > MessageDigests.getAndDigest:·gc.alloc.rate.norm MD5 16 15 > 232.009 ± 0.006 B/op > MessageDigests.getAndDigest:·gc.alloc.rate.norm SHA-1 16 15 > 584.021 ± 0.001 B/op > MessageDigests.getAndDigest:·gc.alloc.rate.norm SHA-256 16 15 > 272.012 ± 0.015 B/op > MessageDigests.getAndDigest:·gc.alloc.rate.norm SHA-512 16 15 > 400.017 ± 0.019 B/op > > For the `digest` micro digesting small inputs is faster with all algorithms, > ranging from ~1% for SHA-512 up to ~15% for MD5. The gain stems from not > allocating and reading into a temporary buffer once outside of the intrinsic. > SHA-1 does not see a statistically gain because the intrinsic is disabled by > default on my HW. > > For the `getAndDigest` micro - which tests > `MessageDigest.getInstance(..).digest(..)` there are similar gains with this > patch. The interesting aspect here is verifying the reduction in allocations > per operation when there's an active intrinsic (again, not for SHA-1). > JDK-8259065 (#1933) reduced allocations on each of these with 144B/op, which > means allocation pressure for SHA-512 is down two thirds from 1200B/op to > 400B/op in this contrived test. > > I've verified there are no regressions in the absence of the intrinsic - > which the SHA-1 numbers here help show. test/micro/org/openjdk/bench/java/util/UUIDBench.java line 2: > 1: /* > 2: * Copyright (c) 2020, 2021, Oracle and/or its affiliates. All rights > reserved. nit: other files should also have this 2021 update. It seems most of them are not updated and still uses 2020. ------------- PR: https://git.openjdk.java.net/jdk/pull/1855