Alex Herbert created NUMBERS-183:
------------------------------------

             Summary: Improve binomial coefficient classes
                 Key: NUMBERS-183
                 URL: https://issues.apache.org/jira/browse/NUMBERS-183
             Project: Commons Numbers
          Issue Type: Improvement
          Components: combinatorics
    Affects Versions: 1.0
            Reporter: Alex Herbert
             Fix For: 1.1


The binomial coefficient classes can be improved by several small changes.
h2. Avoid recursion

All classes use a recursive method call to compute either C(n, k) or C(n, n - 
k) depending which is smaller. The methods can avoid recursion using m = min(k, 
n - k) and computing C(n, m).
h2. Threshold for overflow

The LogBinomialCoeffiecient knows that C(1030, 515) overflows a double. But the 
BinomialCoefficientDouble does not. It will compute C(100000, 50000) with a 
loop of 50000 computations and return infinity. The class should be updated 
with some better thresholds:
{noformat}
C(1030, 515)           ~ 2.85e308   =>  m >= 515  is overflow
C(2147483647, 37) * 37 ~ 5.13e303   =>  m < 38    cannot overflow
{noformat}
h2. Overflow protection

Better overflow protection is required for large results. These all currently 
fail as they overflow to infinity in the intermediate computation:
{code:java}
    @ParameterizedTest
    @CsvSource({
        "1040, 450, 2.3101613255412135615e307",
        "1029, 514, 1.4298206864989040819e308",
        "1786388282, 38, 7.187239013254065384599502085053593e306",
        "1914878305, 38, 100.6570419073661447979173868523364e306",
        "1179067476, 39, 30.22890249420109200962786203300876e306",
        "2147483647, 37, 1.388890512412231479281222156415993e302",
    })
    void testBinomialCoefficientLargeK(int n, int k, double nCk) {
        final double x = BinomialCoefficientDouble.value(n, k);
        Assertions.assertEquals(nCk, x, Math.ulp(nCk) * 10,
            () -> "ULP error: " + (x - nCk) / Math.ulp(nCk));
    }
{code}
A test for overflow in the loop can rearrange the terms to compute the correct 
result.
h2. Use the known factorials

The BinomialCoefficientDouble class can exploit the known factorials stored in 
the Factorial class and avoid a loop computation for small n:
{code:java}
if (n <= 170) {
    // Use fast table lookup:
    result = Factorial.doubleValue(n);
    // Smaller m will have a more accurate factorial and may be exact
    result /= Factorial.doubleValue(m);
    result /= Factorial.doubleValue(n - m);
}{code}
The result appears to be as accurate as the current implementation on a partial 
test dataset. The complete dataset of factorials will require approximately 
(sum i=1 ... 170/2) * 2 == 7310 values so could be tested to see if this fast 
method is robust.
h2. Avoid cancellation in summations

The LogBinomialCoefficient sum will suffer from cancellation:
{code:java}
        // Sum for values that could overflow.
        double logSum = 0;

        // n! / (n - k)!
        for (int i = n - k + 1; i <= n; i++) {
            logSum += Math.log(i);
        }

        // Divide by k!
        for (int i = 2; i <= k; i++) {
            logSum -= Math.log(i);
        }

        return logSum;
{code}
The divide by k! should be summed separately and subtracted as a single sum. 
This could be performed using the LogFactorial class which can cache results 
for reuse.
h2. Use the gamma functions

Note: The combinatorics package depends on the gamma package. The factorial can 
be computed using the gamma function:
{noformat}
n! = gamma(n+1)
{noformat}
This is used in the LogFactorial class which delegates to LogGamma.

The binomial coefficient can be computed using the beta function:
{noformat}
   n!                gamma(n+1)               gamma(k+1) * gamma(n-k+1)
---------   = ------------------------- = 1 / -------------------------
k! (n-k)!     gamma(k+1) * gamma(n-k+1)              gamma(n+1)


            = 1 / (k * beta(k, n-k+1)) = 1 / ((n-k) * beta(k+1, n-k))


beta(a, b) = gamma(a) * gamma(b) / gamma(a+b)
{noformat}
Note: In converting the Boost beta function implementation (NUMBERS-181) I have 
tested using the beta function to compute the binomial coefficient. It is not 
more accurate than using the current implementation within a loop to compute 
the multiplication of terms.

I have not tested using LogBeta to compute the LogBinomialCoefficient. The 
accuracy of this should be checked against using the current summation of logs 
within a loop.

 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to