aherbert commented on PR #139:
URL: https://github.com/apache/commons-numbers/pull/139#issuecomment-1975187671

   I've obtained a copy of Hacker's Delight 2.0. Section 9.3 contains details 
of how to implement unsigned divide. The unsigned modulus is not provided but I 
have reworked the current code based on the divide code and the old remainder 
code.
   
   JDK 17
   
   ```
   Benchmark                     (length)                  (name)  Mode  Cnt    
Score    Error  Units
   ArithmeticPerformance.longOp      1024     Long.divideUnsigned  avgt    5  
400.189 ±  8.950  ns/op
   ArithmeticPerformance.longOp      1024      divideUnsigned_1.0  avgt    5  
583.359 ± 41.198  ns/op
   ArithmeticPerformance.longOp      1024      divideUnsigned_1.1  avgt    5  
402.176 ± 14.229  ns/op
   ArithmeticPerformance.longOp      1024  Long.remainderUnsigned  avgt    5  
442.659 ±  6.195  ns/op
   ArithmeticPerformance.longOp      1024   remainderUnsigned_1.0  avgt    5  
547.328 ± 69.341  ns/op
   ArithmeticPerformance.longOp      1024   remainderUnsigned_1.1  avgt    5  
438.215 ± 16.554  ns/op
   ```
   
   So the new long version is approximately 25-30% faster and matches the JDK.
   
   JDK 21 (Long.remainderUnsigned and Long.divideUnsigned are intrinsic methods 
from JDK 19)
   
   ```
   Benchmark                     (length)                  (name)  Mode  Cnt    
Score    Error  Units
   ArithmeticPerformance.longOp      1024     Long.divideUnsigned  avgt    5  
297.357 ±  3.524  ns/op
   ArithmeticPerformance.longOp      1024      divideUnsigned_1.0  avgt    5  
600.204 ± 37.312  ns/op
   ArithmeticPerformance.longOp      1024      divideUnsigned_1.1  avgt    5  
428.529 ± 22.745  ns/op
   ArithmeticPerformance.longOp      1024  Long.remainderUnsigned  avgt    5  
296.415 ±  1.678  ns/op
   ArithmeticPerformance.longOp      1024   remainderUnsigned_1.0  avgt    5  
511.435 ± 18.497  ns/op
   ArithmeticPerformance.longOp      1024   remainderUnsigned_1.1  avgt    5  
433.988 ± 34.247  ns/op
   ```
   
   Here the JDK is faster due to the intrinsic methods (or some other 
optimisation - I didn't check the source code).
   
   I have updated the int version to delegate to 
Integer.remainderUnsigned/divideUnsigned with the rather obvious result that it 
is now as fast as the JDK (which uses long arithmetic):
   
   ```
   Benchmark                    (length)                     (name)  Mode  Cnt  
  Score    Error  Units
   ArithmeticPerformance.intOp      1024     Integer.divideUnsigned  avgt    5  
297.178 ±  1.721  ns/op
   ArithmeticPerformance.intOp      1024         divideUnsigned_1.0  avgt    5  
588.749 ± 45.638  ns/op
   ArithmeticPerformance.intOp      1024         divideUnsigned_1.1  avgt    5  
296.653 ±  1.402  ns/op
   ArithmeticPerformance.intOp      1024  Integer.remainderUnsigned  avgt    5  
296.926 ±  3.817  ns/op
   ArithmeticPerformance.intOp      1024      remainderUnsigned_1.0  avgt    5  
518.333 ± 19.581  ns/op
   ArithmeticPerformance.intOp      1024      remainderUnsigned_1.1  avgt    5  
297.276 ±  4.458  ns/op
   ```
   
   Changes added to master.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to