On Sun, 20 Dec 2020 20:27:03 GMT, Claes Redestad <redes...@openjdk.org> wrote:
> - The MD5 intrinsics added by > [JDK-8250902](https://bugs.openjdk.java.net/browse/JDK-8250902) shows that > the `int[] x` isn't actually needed. This also applies to the SHA intrinsics > from which the MD5 intrinsic takes inspiration > - Using VarHandles we can simplify the code in `ByteArrayAccess` enough to > make it acceptable to use inline and replace the array in MD5 wholesale. This > improves performance both in the presence and the absence of the intrinsic > optimization. > - Doing the exact same thing in the SHA impls would be unwieldy (64+ element > arrays), but allocating the array lazily gets most of the speed-up in the > presence of an intrinsic while being neutral in its absence. > > Baseline: > (digesterName) (length) Cnt Score > Error Units > MessageDigests.digest MD5 16 15 > 2714.307 ± 21.133 ops/ms > MessageDigests.digest MD5 1024 15 > 318.087 ± 0.637 ops/ms > MessageDigests.digest SHA-1 16 15 > 1387.266 ± 40.932 ops/ms > MessageDigests.digest SHA-1 1024 15 > 109.273 ± 0.149 ops/ms > MessageDigests.digest SHA-256 16 15 > 995.566 ± 21.186 ops/ms > MessageDigests.digest SHA-256 1024 15 > 89.104 ± 0.079 ops/ms > MessageDigests.digest SHA-512 16 15 > 803.030 ± 15.722 ops/ms > MessageDigests.digest SHA-512 1024 15 > 115.611 ± 0.234 ops/ms > MessageDigests.getAndDigest MD5 16 15 > 2190.367 ± 97.037 ops/ms > MessageDigests.getAndDigest MD5 1024 15 > 302.903 ± 1.809 ops/ms > MessageDigests.getAndDigest SHA-1 16 15 > 1262.656 ± 43.751 ops/ms > MessageDigests.getAndDigest SHA-1 1024 15 > 104.889 ± 3.554 ops/ms > MessageDigests.getAndDigest SHA-256 16 15 > 914.541 ± 55.621 ops/ms > MessageDigests.getAndDigest SHA-256 1024 15 > 85.708 ± 1.394 ops/ms > MessageDigests.getAndDigest SHA-512 16 15 > 737.719 ± 53.671 ops/ms > MessageDigests.getAndDigest SHA-512 1024 15 > 112.307 ± 1.950 ops/ms > > GC: > MessageDigests.getAndDigest:·gc.alloc.rate.norm MD5 16 15 > 312.011 ± 0.005 B/op > MessageDigests.getAndDigest:·gc.alloc.rate.norm SHA-1 16 15 > 584.020 ± 0.006 B/op > MessageDigests.getAndDigest:·gc.alloc.rate.norm SHA-256 16 15 > 544.019 ± 0.016 B/op > MessageDigests.getAndDigest:·gc.alloc.rate.norm SHA-512 16 15 > 1056.037 ± 0.003 B/op > > Target: > Benchmark (digesterName) (length) Cnt > Score Error Units > MessageDigests.digest MD5 16 15 > 3134.462 ± 43.685 ops/ms > MessageDigests.digest MD5 1024 15 > 323.667 ± 0.633 ops/ms > MessageDigests.digest SHA-1 16 15 > 1418.742 ± 38.223 ops/ms > MessageDigests.digest SHA-1 1024 15 > 110.178 ± 0.788 ops/ms > MessageDigests.digest SHA-256 16 15 > 1037.949 ± 21.214 ops/ms > MessageDigests.digest SHA-256 1024 15 > 89.671 ± 0.228 ops/ms > MessageDigests.digest SHA-512 16 15 > 812.028 ± 39.489 ops/ms > MessageDigests.digest SHA-512 1024 15 > 116.738 ± 0.249 ops/ms > MessageDigests.getAndDigest MD5 16 15 > 2314.379 ± 229.294 ops/ms > MessageDigests.getAndDigest MD5 1024 15 > 307.835 ± 5.730 ops/ms > MessageDigests.getAndDigest SHA-1 16 15 > 1326.887 ± 63.263 ops/ms > MessageDigests.getAndDigest SHA-1 1024 15 > 106.611 ± 2.292 ops/ms > MessageDigests.getAndDigest SHA-256 16 15 > 961.589 ± 82.052 ops/ms > MessageDigests.getAndDigest SHA-256 1024 15 > 88.646 ± 0.194 ops/ms > MessageDigests.getAndDigest SHA-512 16 15 > 775.417 ± 56.775 ops/ms > MessageDigests.getAndDigest SHA-512 1024 15 > 112.904 ± 2.014 ops/ms > > GC > MessageDigests.getAndDigest:·gc.alloc.rate.norm MD5 16 15 > 232.009 ± 0.006 B/op > MessageDigests.getAndDigest:·gc.alloc.rate.norm SHA-1 16 15 > 584.021 ± 0.001 B/op > MessageDigests.getAndDigest:·gc.alloc.rate.norm SHA-256 16 15 > 272.012 ± 0.015 B/op > MessageDigests.getAndDigest:·gc.alloc.rate.norm SHA-512 16 15 > 400.017 ± 0.019 B/op > > For the `digest` micro digesting small inputs is faster with all algorithms, > ranging from ~1% for SHA-512 up to ~15% for MD5. The gain stems from not > allocating and reading into a temporary buffer once outside of the intrinsic. > SHA-1 does not see a statistically gain because the intrinsic is disabled by > default on my HW. > > For the `getAndDigest` micro - which tests > `MessageDigest.getInstance(..).digest(..)` there are similar gains with this > patch. The interesting aspect here is verifying the reduction in allocations > per operation when there's an active intrinsic (again, not for SHA-1). > JDK-8259065 (#1933) reduced allocations on each of these with 144B/op, which > means allocation pressure for SHA-512 is down two thirds from 1200B/op to > 400B/op in this contrived test. > > I've verified there are no regressions in the absence of the intrinsic - > which the SHA-1 numbers here help show. src/java.base/share/classes/sun/security/provider/ByteArrayAccess.java line 214: Why do we remove the index checking from all methods? Isn't it safer to check here in case the caller didn't? Or is it such checking is already implemented inside the the various methods of VarHandle? ------------- PR: https://git.openjdk.java.net/jdk/pull/1855