Measuring performance changes from applying 4837946 patch

Brian Burkhalter Mon, 03 Jun 2013 17:30:47 -0700

I have spent some time trying to assess the performance improvements to be 
accrued by applying the patch

http://cr.openjdk.java.net/~bpb/4837946/

which I previously posted for code review. It is clear from the literature and
from analysis of the algorithms involved that significant performance
improvements should be obtained as the bit lengths of the factors increase.
This also has been verified by other core-libs-dev members in their own
testing. I have indeed observed performance improvements but at vastly
different algorithm thresholds for the bit lengths involved than those which
are currently defined by the thresholds in the updated BigInteger class. I am
posting this message as a sort of "sanity test" of my performance evaluation
attempts as I think it is more likely that any unusual results are due to my
measurements as opposed to the tested code itself.

For performance testing I have thus far exclusively used my regular development
platform which is a 2.8 GHz MacBookPro5,3 [1] running Mac OS 10.7.5 (Lion) and
upgraded, for what it matters, to 8 GB of RAM and a 7200 RPM HDD.

For micro-benchmarking I have primarily used the Java Microbenchmark Harness
(JMH) [2], although I wrote some standalone test code as well. Also, I
constrained testing for the most part to the single-threaded case, although
relative performance between algorithms was similar for two threads.

I am only going to quote a couple of results here pending feedback of a more
general nature about performance testing.

1) Firstly, for single-threaded multiplication of a 1601-bit BigInteger by a
2000-bit BitInteger I am seeing that the Karatsuba algorithm is 60-70% slower
than the standard algorithm, and 3-way Toom-Cook is slower still. The 60-70%
ratio is consistent between the JMH-based tests and the standalone tests. A bit
length of 1600 for both factors is the threshold at which Karatsuba
multiplication should kick in.

2) Based on a number of runs with different bit lengths, I did not see the
Karatsuba algorithm start to outperform the standard algorithm until bit
lengths somewhere between 10400 and 12800. This is about 6.5-8 times the
current threshold.

Note that these numbers are based on a modified version of BigInteger I created
for testing. This modified version contains three public methods
multiply_standard(), multiply_karatsuba(), and multiply_toom_cook(). The first
of these is the usual standard algorithm. The second uses the Karatsuba
algorithm until EITHER factor is below the threshold, then devolves to the
standard algorithm. The third uses the Toom-Cook algorithm until BOTH factors
are below the threshold and then uses the standard algorithm. The purpose of
using these different methods is to avoid mixing the accelerated algorithms
within a single test.

Being quite skeptical of my own results, the questions I have at this point are:

A) Is there some particular approach that should be used in testing these
algorithms for relative performance?

B) Could there be something platform-dependent happening?

C) As currently implemented, the relative performance between algorithms should
be unaffected by thread count, correct?

Any suggestions would be appreciated.

Thanks,

Brian

[1]
http://www.everymac.com/systems/apple/macbook_pro/specs/macbook-pro-core-2-duo-2.8-aluminum-15-mid-2009-sd-unibody-specs.html
[2] http://openjdk.java.net/projects/code-tools/jmh/

Measuring performance changes from applying 4837946 patch

Reply via email to