I have spent some time trying to assess the performance improvements to be 
accrued by applying the patch

http://cr.openjdk.java.net/~bpb/4837946/

which I previously posted for code review. It is clear from the literature and 
from analysis of the algorithms involved that significant performance 
improvements should be obtained as the bit lengths of the factors increase. 
This also has been verified by other core-libs-dev members in their own 
testing. I have indeed observed performance improvements but at vastly 
different algorithm thresholds for the bit lengths involved than those which 
are currently defined by the thresholds in the updated BigInteger class. I am 
posting this message as a sort of "sanity test" of my performance evaluation 
attempts as I think it is more likely that any unusual results are due to my 
measurements as opposed to the tested code itself.

For performance testing I have thus far exclusively used my regular development 
platform which is a 2.8 GHz MacBookPro5,3 [1] running Mac OS 10.7.5 (Lion) and 
upgraded, for what it matters, to 8 GB of RAM and a 7200 RPM HDD.

For micro-benchmarking I have primarily used the Java Microbenchmark Harness 
(JMH) [2], although I wrote some standalone test code as well. Also, I 
constrained testing for the most part to the single-threaded case, although 
relative performance between algorithms was similar for two threads.

I am only going to quote a couple of results here pending feedback of a more 
general nature about performance testing.

1) Firstly, for single-threaded multiplication of a 1601-bit BigInteger by a 
2000-bit BitInteger I am seeing that the Karatsuba algorithm is 60-70% slower 
than the standard algorithm, and 3-way Toom-Cook is slower still. The 60-70% 
ratio is consistent between the JMH-based tests and the standalone tests. A bit 
length of 1600 for both factors is the threshold at which Karatsuba 
multiplication should kick in.

2) Based on a number of runs with different bit lengths, I did not see the 
Karatsuba algorithm start to outperform the standard algorithm until bit 
lengths somewhere between 10400 and 12800. This is about 6.5-8 times the 
current threshold.

Note that these numbers are based on a modified version of BigInteger I created 
for testing. This modified version contains three public methods 
multiply_standard(), multiply_karatsuba(), and multiply_toom_cook(). The first 
of these is the usual standard algorithm. The second uses the Karatsuba 
algorithm until EITHER factor is below the threshold, then devolves to the 
standard algorithm. The third uses the Toom-Cook algorithm until BOTH factors 
are below the threshold and then uses the standard algorithm. The purpose of 
using these different methods is to avoid mixing the accelerated algorithms 
within a single test.

Being quite skeptical of my own results, the questions I have at this point are:

A) Is there some particular approach that should be used in testing these 
algorithms for relative performance?

B) Could there be something platform-dependent happening?

C) As currently implemented, the relative performance between algorithms should 
be unaffected by thread count, correct?

Any suggestions would be appreciated.

Thanks,

Brian

[1] 
http://www.everymac.com/systems/apple/macbook_pro/specs/macbook-pro-core-2-duo-2.8-aluminum-15-mid-2009-sd-unibody-specs.html
[2] http://openjdk.java.net/projects/code-tools/jmh/

Reply via email to