Re: [PR] Calculate GCD for longs more efficiently [commons-numbers]

via GitHub Sat, 02 Mar 2024 07:38:35 -0800


aherbert commented on PR #140:
URL: https://github.com/apache/commons-numbers/pull/140#issuecomment-1974831748


   Installed the numbers-core module locally (forgot to do this at first).
   
   MacBook Pro M2:
   ```
   mvn -v
   Apache Maven 3.9.4 (dfbb324ad4a7c8fb0bf182e6d91b0ae20e3d2dd9)
   Maven home: /Users/ah403/mvn/mvn
   Java version: 21.0.1, vendor: Eclipse Adoptium, runtime: 
/Library/Java/JavaVirtualMachines/temurin-21.jdk/Contents/Home
   Default locale: en_GB, platform encoding: UTF-8
   OS name: "mac os x", version: "14.2.1", arch: "aarch64", family: "mac"
   ```
   
   I did also check JDK 11 and saw the same results so using JDK 21 is not the 
reason for the difference.
   
   With benchmarking it is recommended to avoid `final` input (as the JVM may 
recognise this and learn what to do). So I would prefer the benchmark without 
final or fixed data.
   
   I changed the benchmark to build new data each iteration:
   ```java
   @Setup(Level.Iteration)
   public void setup() {
       values = getRandomProvider(seed).ints()
               .filter(i -> i != Integer.MIN_VALUE).
               limit(numPairs * 2)
               .toArray();
       seed = (((long) values[0]) << Integer.SIZE) | values[1];
   }
   
   @Setup(Level.Iteration)
   public void setup() {
       values = getRandomProvider(seed).longs()
               .filter(i -> i != Long.MIN_VALUE)
               .limit(numPairs * 2)
               .toArray();
       seed = values[0];
   }
   ```
   
   I find you get more stable timings with more than 1000 pairs. Here are 
100000 pairs reseeded each time:
   ```
   Benchmark                               (numPairs)  (seed)   Mode  Cnt    
Score   Error  Units
   GcdPerformance.gcdBigInteger                100000      42  thrpt   10   
21.217 ± 0.117  ops/s
   GcdPerformance.gcdInt                       100000      42  thrpt   10  
231.625 ± 2.014  ops/s
   GcdPerformance.gcdLong                      100000      42  thrpt   10  
115.497 ± 0.446  ops/s
   GcdPerformance.oldGcdInt                    100000      42  thrpt   10  
253.902 ± 2.531  ops/s
   GcdPerformance.oldGcdIntAdaptedForLong      100000      42  thrpt   10   
68.111 ± 0.909  ops/s
   GcdPerformance.oldGcdLong                   100000      42  thrpt   10   
29.181 ± 0.037  ops/s
   ```
   
   I see the `oldGcdInt` is faster than the new `gcdInt`. The 
`oldGcdIntAdaptedForLong` is out performed by the new `gcdLong`. This is 
interesting given the trivial difference between them.
   
   ```
   # average time
   java -jar target/examples-jmh.jar Gcd -pnumPairs=100000 -bm avgt -tu ns
   
   Benchmark                               (numPairs)  (seed)  Mode  Cnt        
 Score        Error  Units
   GcdPerformance.gcdBigInteger                100000      42  avgt   10  
47321374.441 ± 316079.389  ns/op
   GcdPerformance.gcdInt                       100000      42  avgt   10   
4365814.289 ±  34222.618  ns/op
   GcdPerformance.gcdLong                      100000      42  avgt   10   
8711690.051 ±  25766.036  ns/op
   GcdPerformance.oldGcdInt                    100000      42  avgt   10   
3945841.489 ±  35975.472  ns/op
   GcdPerformance.oldGcdIntAdaptedForLong      100000      42  avgt   10  
14703364.077 ± 539905.259  ns/op
   GcdPerformance.oldGcdLong                   100000      42  avgt   10  
34265752.087 ± 206487.358  ns/op
   ```
   
   Same.
   
   So on this machine it seems that the `oldGcdInt` is OK, but adapting that 
method for long requires the change to not use `while (a != b)`.
   
   If this is repeatable on other machines then I think the new `gcdLong` 
implementation should have a comment to state why it is different from the 
`gcdInt`, e.g.
   ```
   // This method is intentionally different from the int gcd implementation.
   // Benchmarking shows the test for long inequality (a != b) is slow compared 
to
   // testing for equality of the delta to zero. The same change on the int gcd
   // reduces performance there, hence we have two variants of the method.
   ```
   
   I thought it may be due to the fact that the the loop is executed more often 
for 64-bit numbers and the change aided in pipelining as the delta is required 
more often. So I recompiled with the long data limited to 32-bit (int) data. 
However the difference is still there:
   
   ```
   Benchmark                               (numPairs)  (seed)   Mode  Cnt    
Score   Error  Units
   GcdPerformance.gcdBigInteger                100000      42  thrpt   10   
96.059 ± 2.658  ops/s
   GcdPerformance.gcdInt                       100000      42  thrpt   10  
230.165 ± 0.501  ops/s
   GcdPerformance.gcdLong                      100000      42  thrpt   10  
218.651 ± 0.516  ops/s
   GcdPerformance.oldGcdInt                    100000      42  thrpt   10  
252.595 ± 0.822  ops/s
   GcdPerformance.oldGcdIntAdaptedForLong      100000      42  thrpt   10  
131.465 ± 0.451  ops/s
   GcdPerformance.oldGcdLong                   100000      42  thrpt   10   
58.857 ± 0.123  ops/s
   ```
   
   Note that int data make BigInteger 4x faster (it has a smaller 
representation in memory) and the long implementations 2x faster. So the loop 
execution count is approximately halved. The new long gcd version approaches 
the speed of the int versions.
   
   Since the speed difference is still there it may be that equality comparison 
between longs is slower than comparison to zero. But I did not investigate 
further. Note that to compare a long is not zero you can fold the upper and 
lower bits to a single 32-bit integer and test that is not zero. So perhaps 
comparison of long to zero is faster due to such optimisation. But I'm not 
familiar with what goes on at the hardware level.
   
   I will repeat the benchmark on another machine when I go back to work on 
Monday. I find the M2 processor can often fail to show differences that are 
seen on my old Xeon workstation processor, i.e. make sure to test on the worst 
processor(s) you have access to.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] Calculate GCD for longs more efficiently [commons-numbers]

Reply via email to