On Mon, 18 Jan 2021 13:39:04 GMT, Claes Redestad <redes...@openjdk.org> wrote:
>> - The MD5 intrinsics added by >> [JDK-8250902](https://bugs.openjdk.java.net/browse/JDK-8250902) shows that >> the `int[] x` isn't actually needed. This also applies to the SHA intrinsics >> from which the MD5 intrinsic takes inspiration >> - Using VarHandles we can simplify the code in `ByteArrayAccess` enough to >> make it acceptable to use inline and replace the array in MD5 wholesale. >> This improves performance both in the presence and the absence of the >> intrinsic optimization. >> - Doing the exact same thing in the SHA impls would be unwieldy (64+ element >> arrays), but allocating the array lazily gets most of the speed-up in the >> presence of an intrinsic while being neutral in its absence. >> >> Baseline: >> (digesterName) (length) Cnt Score >> Error Units >> MessageDigests.digest MD5 16 15 >> 2714.307 ± 21.133 ops/ms >> MessageDigests.digest MD5 1024 15 >> 318.087 ± 0.637 ops/ms >> MessageDigests.digest SHA-1 16 15 >> 1387.266 ± 40.932 ops/ms >> MessageDigests.digest SHA-1 1024 15 >> 109.273 ± 0.149 ops/ms >> MessageDigests.digest SHA-256 16 15 >> 995.566 ± 21.186 ops/ms >> MessageDigests.digest SHA-256 1024 15 >> 89.104 ± 0.079 ops/ms >> MessageDigests.digest SHA-512 16 15 >> 803.030 ± 15.722 ops/ms >> MessageDigests.digest SHA-512 1024 15 >> 115.611 ± 0.234 ops/ms >> MessageDigests.getAndDigest MD5 16 15 >> 2190.367 ± 97.037 ops/ms >> MessageDigests.getAndDigest MD5 1024 15 >> 302.903 ± 1.809 ops/ms >> MessageDigests.getAndDigest SHA-1 16 15 >> 1262.656 ± 43.751 ops/ms >> MessageDigests.getAndDigest SHA-1 1024 15 >> 104.889 ± 3.554 ops/ms >> MessageDigests.getAndDigest SHA-256 16 15 >> 914.541 ± 55.621 ops/ms >> MessageDigests.getAndDigest SHA-256 1024 15 >> 85.708 ± 1.394 ops/ms >> MessageDigests.getAndDigest SHA-512 16 15 >> 737.719 ± 53.671 ops/ms >> MessageDigests.getAndDigest SHA-512 1024 15 >> 112.307 ± 1.950 ops/ms >> >> GC: >> MessageDigests.getAndDigest:·gc.alloc.rate.norm MD5 16 15 >> 312.011 ± 0.005 B/op >> MessageDigests.getAndDigest:·gc.alloc.rate.norm SHA-1 16 15 >> 584.020 ± 0.006 B/op >> MessageDigests.getAndDigest:·gc.alloc.rate.norm SHA-256 16 15 >> 544.019 ± 0.016 B/op >> MessageDigests.getAndDigest:·gc.alloc.rate.norm SHA-512 16 15 >> 1056.037 ± 0.003 B/op >> >> Target: >> Benchmark (digesterName) (length) Cnt >> Score Error Units >> MessageDigests.digest MD5 16 15 >> 3134.462 ± 43.685 ops/ms >> MessageDigests.digest MD5 1024 15 >> 323.667 ± 0.633 ops/ms >> MessageDigests.digest SHA-1 16 15 >> 1418.742 ± 38.223 ops/ms >> MessageDigests.digest SHA-1 1024 15 >> 110.178 ± 0.788 ops/ms >> MessageDigests.digest SHA-256 16 15 >> 1037.949 ± 21.214 ops/ms >> MessageDigests.digest SHA-256 1024 15 >> 89.671 ± 0.228 ops/ms >> MessageDigests.digest SHA-512 16 15 >> 812.028 ± 39.489 ops/ms >> MessageDigests.digest SHA-512 1024 15 >> 116.738 ± 0.249 ops/ms >> MessageDigests.getAndDigest MD5 16 15 >> 2314.379 ± 229.294 ops/ms >> MessageDigests.getAndDigest MD5 1024 15 >> 307.835 ± 5.730 ops/ms >> MessageDigests.getAndDigest SHA-1 16 15 >> 1326.887 ± 63.263 ops/ms >> MessageDigests.getAndDigest SHA-1 1024 15 >> 106.611 ± 2.292 ops/ms >> MessageDigests.getAndDigest SHA-256 16 15 >> 961.589 ± 82.052 ops/ms >> MessageDigests.getAndDigest SHA-256 1024 15 >> 88.646 ± 0.194 ops/ms >> MessageDigests.getAndDigest SHA-512 16 15 >> 775.417 ± 56.775 ops/ms >> MessageDigests.getAndDigest SHA-512 1024 15 >> 112.904 ± 2.014 ops/ms >> >> GC >> MessageDigests.getAndDigest:·gc.alloc.rate.norm MD5 16 15 >> 232.009 ± 0.006 B/op >> MessageDigests.getAndDigest:·gc.alloc.rate.norm SHA-1 16 15 >> 584.021 ± 0.001 B/op >> MessageDigests.getAndDigest:·gc.alloc.rate.norm SHA-256 16 15 >> 272.012 ± 0.015 B/op >> MessageDigests.getAndDigest:·gc.alloc.rate.norm SHA-512 16 15 >> 400.017 ± 0.019 B/op >> >> For the `digest` micro digesting small inputs is faster with all algorithms, >> ranging from ~1% for SHA-512 up to ~15% for MD5. The gain stems from not >> allocating and reading into a temporary buffer once outside of the >> intrinsic. SHA-1 does not see a statistically gain because the intrinsic is >> disabled by default on my HW. >> >> For the `getAndDigest` micro - which tests >> `MessageDigest.getInstance(..).digest(..)` there are similar gains with this >> patch. The interesting aspect here is verifying the reduction in allocations >> per operation when there's an active intrinsic (again, not for SHA-1). >> JDK-8259065 (#1933) reduced allocations on each of these with 144B/op, which >> means allocation pressure for SHA-512 is down two thirds from 1200B/op to >> 400B/op in this contrived test. >> >> I've verified there are no regressions in the absence of the intrinsic - >> which the SHA-1 numbers here help show. > > Claes Redestad has updated the pull request incrementally with one additional > commit since the last revision: > > Remove unused code Marked as reviewed by valeriep (Reviewer). ------------- PR: https://git.openjdk.java.net/jdk/pull/1855