Re: RFR: 8259498: Reduce overhead of MD5 and SHA digests

Valerie Peng Fri, 15 Jan 2021 15:11:47 -0800

On Sun, 20 Dec 2020 20:27:03 GMT, Claes Redestad <redes...@openjdk.org> wrote:


> - The MD5 intrinsics added by 
> [JDK-8250902](https://bugs.openjdk.java.net/browse/JDK-8250902) shows that 
> the `int[] x` isn't actually needed. This also applies to the SHA intrinsics 
> from which the MD5 intrinsic takes inspiration
> - Using VarHandles we can simplify the code in `ByteArrayAccess` enough to 
> make it acceptable to use inline and replace the array in MD5 wholesale. This 
> improves performance both in the presence and the absence of the intrinsic 
> optimization.
> - Doing the exact same thing in the SHA impls would be unwieldy (64+ element 
> arrays), but allocating the array lazily gets most of the speed-up in the 
> presence of an intrinsic while being neutral in its absence.
> 
> Baseline:
>                               (digesterName)  (length)    Cnt     Score      
> Error   Units
> MessageDigests.digest                                MD5        16     15  
> 2714.307 ±   21.133  ops/ms
> MessageDigests.digest                                MD5      1024     15   
> 318.087 ±    0.637  ops/ms
> MessageDigests.digest                              SHA-1        16     15  
> 1387.266 ±   40.932  ops/ms
> MessageDigests.digest                              SHA-1      1024     15   
> 109.273 ±    0.149  ops/ms
> MessageDigests.digest                            SHA-256        16     15   
> 995.566 ±   21.186  ops/ms
> MessageDigests.digest                            SHA-256      1024     15    
> 89.104 ±    0.079  ops/ms
> MessageDigests.digest                            SHA-512        16     15   
> 803.030 ±   15.722  ops/ms
> MessageDigests.digest                            SHA-512      1024     15   
> 115.611 ±    0.234  ops/ms
> MessageDigests.getAndDigest                          MD5        16     15  
> 2190.367 ±   97.037  ops/ms
> MessageDigests.getAndDigest                          MD5      1024     15   
> 302.903 ±    1.809  ops/ms
> MessageDigests.getAndDigest                        SHA-1        16     15  
> 1262.656 ±   43.751  ops/ms
> MessageDigests.getAndDigest                        SHA-1      1024     15   
> 104.889 ±    3.554  ops/ms
> MessageDigests.getAndDigest                      SHA-256        16     15   
> 914.541 ±   55.621  ops/ms
> MessageDigests.getAndDigest                      SHA-256      1024     15    
> 85.708 ±    1.394  ops/ms
> MessageDigests.getAndDigest                      SHA-512        16     15   
> 737.719 ±   53.671  ops/ms
> MessageDigests.getAndDigest                      SHA-512      1024     15   
> 112.307 ±    1.950  ops/ms
> 
> GC:
> MessageDigests.getAndDigest:·gc.alloc.rate.norm      MD5        16     15   
> 312.011 ±    0.005    B/op
> MessageDigests.getAndDigest:·gc.alloc.rate.norm    SHA-1        16     15   
> 584.020 ±    0.006    B/op
> MessageDigests.getAndDigest:·gc.alloc.rate.norm  SHA-256        16     15   
> 544.019 ±    0.016    B/op
> MessageDigests.getAndDigest:·gc.alloc.rate.norm  SHA-512        16     15  
> 1056.037 ±    0.003    B/op
> 
> Target:
> Benchmark                                 (digesterName)  (length)    Cnt     
> Score      Error   Units
> MessageDigests.digest                                MD5        16     15  
> 3134.462 ±   43.685  ops/ms
> MessageDigests.digest                                MD5      1024     15   
> 323.667 ±    0.633  ops/ms
> MessageDigests.digest                              SHA-1        16     15  
> 1418.742 ±   38.223  ops/ms
> MessageDigests.digest                              SHA-1      1024     15   
> 110.178 ±    0.788  ops/ms
> MessageDigests.digest                            SHA-256        16     15  
> 1037.949 ±   21.214  ops/ms
> MessageDigests.digest                            SHA-256      1024     15    
> 89.671 ±    0.228  ops/ms
> MessageDigests.digest                            SHA-512        16     15   
> 812.028 ±   39.489  ops/ms
> MessageDigests.digest                            SHA-512      1024     15   
> 116.738 ±    0.249  ops/ms
> MessageDigests.getAndDigest                          MD5        16     15  
> 2314.379 ±  229.294  ops/ms
> MessageDigests.getAndDigest                          MD5      1024     15   
> 307.835 ±    5.730  ops/ms
> MessageDigests.getAndDigest                        SHA-1        16     15  
> 1326.887 ±   63.263  ops/ms
> MessageDigests.getAndDigest                        SHA-1      1024     15   
> 106.611 ±    2.292  ops/ms
> MessageDigests.getAndDigest                      SHA-256        16     15   
> 961.589 ±   82.052  ops/ms
> MessageDigests.getAndDigest                      SHA-256      1024     15    
> 88.646 ±    0.194  ops/ms
> MessageDigests.getAndDigest                      SHA-512        16     15   
> 775.417 ±   56.775  ops/ms
> MessageDigests.getAndDigest                      SHA-512      1024     15   
> 112.904 ±    2.014  ops/ms
> 
> GC
> MessageDigests.getAndDigest:·gc.alloc.rate.norm      MD5        16     15   
> 232.009 ±    0.006    B/op
> MessageDigests.getAndDigest:·gc.alloc.rate.norm    SHA-1        16     15   
> 584.021 ±    0.001    B/op
> MessageDigests.getAndDigest:·gc.alloc.rate.norm  SHA-256        16     15   
> 272.012 ±    0.015    B/op
> MessageDigests.getAndDigest:·gc.alloc.rate.norm  SHA-512        16     15   
> 400.017 ±    0.019    B/op
> 
> For the `digest` micro digesting small inputs is faster with all algorithms, 
> ranging from ~1% for SHA-512 up to ~15% for MD5. The gain stems from not 
> allocating and reading into a temporary buffer once outside of the intrinsic. 
> SHA-1 does not see a statistically gain because the intrinsic is disabled by 
> default on my HW.
> 
> For the `getAndDigest` micro - which tests 
> `MessageDigest.getInstance(..).digest(..)` there are similar gains with this 
> patch. The interesting aspect here is verifying the reduction in allocations 
> per operation when there's an active intrinsic (again, not for SHA-1). 
> JDK-8259065 (#1933) reduced allocations on each of these with 144B/op, which 
> means allocation pressure for SHA-512 is down two thirds from 1200B/op to 
> 400B/op in this contrived test.
> 
> I've verified there are no regressions in the absence of the intrinsic - 
> which the SHA-1 numbers here help show.

src/java.base/share/classes/sun/security/provider/ByteArrayAccess.java line 214:


Why do we remove the index checking from all methods? Isn't it safer to check 
here in case the caller didn't? Or is it such checking is already implemented 
inside the the various methods of VarHandle?

-------------

PR: https://git.openjdk.java.net/jdk/pull/1855

Re: RFR: 8259498: Reduce overhead of MD5 and SHA digests

Reply via email to