Re: BigInteger performance improvements

Joseph D. Darcy Mon, 04 Feb 2008 21:49:08 -0800

Alan Eliasen wrote:

Joseph D. Darcy wrote:

Yes, this is the right group :-)  As "Java Floating-Point Czar" I also
look after BigInteger, although I haven't had much time for proactive
maintenance there in a while.  I think using faster, more mathematically
sophisticated algorithms in BigInteger for large values would be a fine
change to explore, as long as the performance on small values was
preserved and regression tests appropriate for the new algorithms were
developed too.


   My last step for this patch is improving performance of pow() for
small numbers, which got slightly slower for some small arguments.  (But
some arguments, like those containing powers of 2 in the base, are
*much* faster.)  The other functions like multiply() are basically
unchanged for small numbers.  The new code, as you might expect,
examines the size of the numbers and runs the old "grade-school"
algorithm on small numbers and the Karatsuba algorithm on larger numbers
(with the threshhold point being determined by benchmark and experiment.)

   As to the matter of regression tests, I would presume that Sun
already has regression tests for BigInteger to make sure it gets correct
results.  Can you provide me with these, or are they in the OpenJDK
distribution already?

Let's see, I haven't moved the existing tests over from the closed worldto open, but I can do that after the repositories are accepting changesagain.

  I can check these to make sure that they use
numbers big enough to trigger the threshholds, and if not, extend them
in the same style.

A general comment is that BigInteger and BigDecimal are rather oldclasses and our expectations for regression tests have increased overtime, which is to say there are fewer regression tests than if theclasses were developed today. For my own numerical work, a very largefraction of my engineering time is now spent developing tests ratherthan code.

  Regression tests should hopefully be a no-op,
assuming you have some already.  I'm not adding any new functionality
and no results should change--they should just run faster, of course.

While all the existing tests should still pass, that doesn't necessarilyimply that no new tests should be written :-)

Especially for numerics, the tests need to probe the algorithms wherethey are likely to fail, and the likely failure points can change withthe algorithm. Taking one notorious example, the Pentuim FDIVinstruction passed existing divide tests and ran billions of randomtests successfully, but (after the fact) a small program targeted atprobing interesting SRT boundaries was able to find an indication of aproblem after running for only a fraction of a second:


http://www.cs.berkeley.edu/~wkahan/srtest

It should of course be impossible to write a regression test that
"succeeds with the patch applied, and fails without the patch" like is
requested on the OpenJDK page, unless it is time-limited.

   Of course, I could extend the regression tests to test *huge* numbers
which may or may not be desired if you want the regression tests to run
in short time.  For example, a single exponentiation that takes
milliseconds in my new version takes on the order of 15 minutes with JDK
1.6.  How long are you willing to let regression tests take?  How many
combinations of arguments do you currently test?  Do you think more are
necessary?

I'm confident the existing tests will need to be augmented; I can workwith you developing them.

I'd prefer to process changes in smaller chunks rather a huge one all at
once; however, I may be a bit slow on reviews in the near future due to
some other openjdk work I'm leading up


   I will be submitting a patch addressing the three functions:
multiply(), pow() and (the private) square().  These are intertwined and
it would be more work and testing for both of us to separate out the
patches and apply them in 3 phases.  The patch will definitely not be huge.


Yes, that sounds like a good bundle of initial changes.

   My patches are designed to be as readable and simple as possible for
this phase.  They all build on existing functions, and eschew lots of
low-level bit-fiddling, as those types of changes are harder to
understand and debug.  I think it's best to get working algorithms with
better asymptotic efficiency, as those will vastly improve performance
for large numbers, and tune them by doing more low-level bit fiddling
later.  Even without being tuned to the nth degree, the new algorithms
are vastly faster for large numbers, and identical for small numbers


Regards,

-Joe

Re: BigInteger performance improvements

Reply via email to