JDK 8 RFR 4891331: BigInteger a.multiply(a) should use squaring code

Brian Burkhalter Wed, 23 Oct 2013 17:38:50 -0700

Please review this patch at your convenience:

Issue:  https://bugs.openjdk.java.net/browse/JDK-4891331
Webrev: http://cr.openjdk.java.net/~bpb/4891331/webrev/


This review request follows from this RFC thread:

http://mail.openjdk.java.net/pipermail/core-libs-dev/2013-October/022266.html

More extensive micro-benchmark testing was performed using JMH [1] for the 
following cases:

1) multiply: current version of multiply()
2) multiplyNonSquare: proposed version of multiply() with parameter val such 
that (val != this && val.equals(this)) == true
3) multiplyPowTwo: multiply by val == this with squaring performed via pow(2)
4) multiplySquare: proposed version of multiply() with parameter val such that 
val == this
5) powTwo: straightforward call of pow(2)
6) square: straightforward call of square()

Test #1 represents current performance of multiplying two identical instances 
without squaring code.
Test #2 represents performance after the patch when squaring code is *not* 
used, i.e., measures effect of val == this equality check.
Test #4 represents performance after the patch when squaring code *is* used for 
the case of mag.length >= 8.
Tests #3, #5, and #6 are for informative purposes.

The following bit lengths were tested:

32, 64, 96, 128, 160, 192, 224, 256, 288, 320, 384, 416, 448, 480, 512, 768, 
1024, 1280, 1536.

The following JMH parameters were used across all runs:

# Forks: 5
# Warmup: 5 iterations, 1 s each
# Measurement: 8 iterations, 3 s each
# Benchmark mode: Throughput, ops/time

Two sets of runs were performed on each of two platforms, one single-threaded 
and one with number of threads equal to the number of hardware cores, 4 in one 
case (Linux) and 8 in the other (Mac OS X).

Linux system: 4 X Intel(R) Core(TM)2 Quad CPU    Q6600  @ 2.40GHz.
Mac system: 8 X Intel(R) Xeon(R) CPU           E5520  @ 2.27GHz

Summaries of the aggregate tests results are available at [2], [3], [4], and 
[5] in terms of 99% confidence intervals in operations per millisecond.

The results suggest that the effect of the equality test (val == this) is 
insignificant. Significant performance improvement was however observed for 
squaring of values with int array magnitude length >= 8. The performance gain 
was up to 150-200% for the largest bit length tested (1536). Larger bit lengths 
were not tested as those would enter the current range of Karatsuba 
multiplication (>= 1600 bits).

While not pertinent to this review per se, it should be noted that the 
performance of pow(2) approaches that of square() as the bit length increases. 
It might be worth a subsequent investigation to determine whether there is a 
second threshold beyond which pow(2) overtakes square() altogether.

Results for the same set of tests on a Windows 4 X Core-i7 860 2.8Ghz should be 
available tomorrow. More limited testing previously performed on Windows 
produced in relative terms results similar to the those obtained on Linux and 
Mac OS X.

Lastly, I am not sure that the @implNote is necessary or even desirable here 
and if so whether a CCC request would be in order. Also note that if there were 
any doubt as to the exact threshold which ought to be used, then a slightly 
higher value than 8 could be used or the threshold test could be changed to a 
strict ">" instead of ">=".

Thanks,

Brian

[1] http://openjdk.java.net/projects/code-tools/jmh/
[2] http://cr.openjdk.java.net/~bpb/4891331/linux-threads-1.html
[3] http://cr.openjdk.java.net/~bpb/4891331/linux-threads-4.html
[4] http://cr.openjdk.java.net/~bpb/4891331/macosx-threads-1.html
[5] http://cr.openjdk.java.net/~bpb/4891331/macosx-threads-8.html

JDK 8 RFR 4891331: BigInteger a.multiply(a) should use squaring code

Reply via email to