> - The MD5 intrinsics added by 
> [JDK-8250902](https://bugs.openjdk.java.net/browse/JDK-8250902) shows that 
> the `int[] x` isn't actually needed. This also applies to the SHA intrinsics 
> from which the MD5 intrinsic takes inspiration
> - Using VarHandles we can simplify the code in `ByteArrayAccess` enough to 
> make it acceptable to use inline and replace the array in MD5 wholesale. This 
> improves performance both in the presence and the absence of the intrinsic 
> optimization.
> - Doing the exact same thing in the SHA impls would be unwieldy (64+ element 
> arrays), but allocating the array lazily gets most of the speed-up in the 
> presence of an intrinsic while being neutral in its absence.
> 
> Baseline:
>                               (digesterName)  (length)    Cnt     Score      
> Error   Units
> MessageDigests.digest                                MD5        16     15  
> 2714.307 ±   21.133  ops/ms
> MessageDigests.digest                                MD5      1024     15   
> 318.087 ±    0.637  ops/ms
> MessageDigests.digest                              SHA-1        16     15  
> 1387.266 ±   40.932  ops/ms
> MessageDigests.digest                              SHA-1      1024     15   
> 109.273 ±    0.149  ops/ms
> MessageDigests.digest                            SHA-256        16     15   
> 995.566 ±   21.186  ops/ms
> MessageDigests.digest                            SHA-256      1024     15    
> 89.104 ±    0.079  ops/ms
> MessageDigests.digest                            SHA-512        16     15   
> 803.030 ±   15.722  ops/ms
> MessageDigests.digest                            SHA-512      1024     15   
> 115.611 ±    0.234  ops/ms
> MessageDigests.getAndDigest                          MD5        16     15  
> 2190.367 ±   97.037  ops/ms
> MessageDigests.getAndDigest                          MD5      1024     15   
> 302.903 ±    1.809  ops/ms
> MessageDigests.getAndDigest                        SHA-1        16     15  
> 1262.656 ±   43.751  ops/ms
> MessageDigests.getAndDigest                        SHA-1      1024     15   
> 104.889 ±    3.554  ops/ms
> MessageDigests.getAndDigest                      SHA-256        16     15   
> 914.541 ±   55.621  ops/ms
> MessageDigests.getAndDigest                      SHA-256      1024     15    
> 85.708 ±    1.394  ops/ms
> MessageDigests.getAndDigest                      SHA-512        16     15   
> 737.719 ±   53.671  ops/ms
> MessageDigests.getAndDigest                      SHA-512      1024     15   
> 112.307 ±    1.950  ops/ms
> 
> GC:
> MessageDigests.getAndDigest:·gc.alloc.rate.norm      MD5        16     15   
> 312.011 ±    0.005    B/op
> MessageDigests.getAndDigest:·gc.alloc.rate.norm    SHA-1        16     15   
> 584.020 ±    0.006    B/op
> MessageDigests.getAndDigest:·gc.alloc.rate.norm  SHA-256        16     15   
> 544.019 ±    0.016    B/op
> MessageDigests.getAndDigest:·gc.alloc.rate.norm  SHA-512        16     15  
> 1056.037 ±    0.003    B/op
> 
> Target:
> Benchmark                                 (digesterName)  (length)    Cnt     
> Score      Error   Units
> MessageDigests.digest                                MD5        16     15  
> 3134.462 ±   43.685  ops/ms
> MessageDigests.digest                                MD5      1024     15   
> 323.667 ±    0.633  ops/ms
> MessageDigests.digest                              SHA-1        16     15  
> 1418.742 ±   38.223  ops/ms
> MessageDigests.digest                              SHA-1      1024     15   
> 110.178 ±    0.788  ops/ms
> MessageDigests.digest                            SHA-256        16     15  
> 1037.949 ±   21.214  ops/ms
> MessageDigests.digest                            SHA-256      1024     15    
> 89.671 ±    0.228  ops/ms
> MessageDigests.digest                            SHA-512        16     15   
> 812.028 ±   39.489  ops/ms
> MessageDigests.digest                            SHA-512      1024     15   
> 116.738 ±    0.249  ops/ms
> MessageDigests.getAndDigest                          MD5        16     15  
> 2314.379 ±  229.294  ops/ms
> MessageDigests.getAndDigest                          MD5      1024     15   
> 307.835 ±    5.730  ops/ms
> MessageDigests.getAndDigest                        SHA-1        16     15  
> 1326.887 ±   63.263  ops/ms
> MessageDigests.getAndDigest                        SHA-1      1024     15   
> 106.611 ±    2.292  ops/ms
> MessageDigests.getAndDigest                      SHA-256        16     15   
> 961.589 ±   82.052  ops/ms
> MessageDigests.getAndDigest                      SHA-256      1024     15    
> 88.646 ±    0.194  ops/ms
> MessageDigests.getAndDigest                      SHA-512        16     15   
> 775.417 ±   56.775  ops/ms
> MessageDigests.getAndDigest                      SHA-512      1024     15   
> 112.904 ±    2.014  ops/ms
> 
> GC
> MessageDigests.getAndDigest:·gc.alloc.rate.norm      MD5        16     15   
> 232.009 ±    0.006    B/op
> MessageDigests.getAndDigest:·gc.alloc.rate.norm    SHA-1        16     15   
> 584.021 ±    0.001    B/op
> MessageDigests.getAndDigest:·gc.alloc.rate.norm  SHA-256        16     15   
> 272.012 ±    0.015    B/op
> MessageDigests.getAndDigest:·gc.alloc.rate.norm  SHA-512        16     15   
> 400.017 ±    0.019    B/op
> 
> For the `digest` micro digesting small inputs is faster with all algorithms, 
> ranging from ~1% for SHA-512 up to ~15% for MD5. The gain stems from not 
> allocating and reading into a temporary buffer once outside of the intrinsic. 
> SHA-1 does not see a statistically gain because the intrinsic is disabled by 
> default on my HW.
> 
> For the `getAndDigest` micro - which tests 
> `MessageDigest.getInstance(..).digest(..)` there are similar gains with this 
> patch. The interesting aspect here is verifying the reduction in allocations 
> per operation when there's an active intrinsic (again, not for SHA-1). 
> JDK-8259065 (#1933) reduced allocations on each of these with 144B/op, which 
> means allocation pressure for SHA-512 is down two thirds from 1200B/op to 
> 400B/op in this contrived test.
> 
> I've verified there are no regressions in the absence of the intrinsic - 
> which the SHA-1 numbers here help show.

Claes Redestad has updated the pull request with a new target base due to a 
merge or a rebase. The incremental webrev excludes the unrelated changes 
brought in by the merge/rebase. The pull request contains 20 additional commits 
since the last revision:

 - Copyrights
 - Merge branch 'master' into improve_md5
 - Remove unused Unsafe import
 - Harmonize MD4 impl, remove now-redundant checks from ByteArrayAccess (VHs do 
bounds checks, most of which will be optimized away)
 - Merge branch 'master' into improve_md5
 - Apply allocation avoiding optimizations to all SHA versions sharing 
structural similarities with MD5
 - Remove unused reverseBytes imports
 - Copyrights
 - Fix copy-paste error
 - Various fixes (IDE stopped IDEing..)
 - ... and 10 more: https://git.openjdk.java.net/jdk/compare/03e99844...cafa3e49

-------------

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/1855/files
  - new: https://git.openjdk.java.net/jdk/pull/1855/files/e1c943c5..cafa3e49

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=1855&range=01
 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=1855&range=00-01

  Stats: 28760 lines in 1103 files changed: 16020 ins; 7214 del; 5526 mod
  Patch: https://git.openjdk.java.net/jdk/pull/1855.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/1855/head:pull/1855

PR: https://git.openjdk.java.net/jdk/pull/1855

Reply via email to