[ 
https://issues.apache.org/jira/browse/NUMBERS-183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Herbert resolved NUMBERS-183.
----------------------------------
    Resolution: Implemented

Binomial coefficient classes update in commit:

d22325416794452dcf19d998374b4dd1519523bf

> Improve binomial coefficient classes
> ------------------------------------
>
>                 Key: NUMBERS-183
>                 URL: https://issues.apache.org/jira/browse/NUMBERS-183
>             Project: Commons Numbers
>          Issue Type: Improvement
>          Components: combinatorics
>    Affects Versions: 1.0
>            Reporter: Alex Herbert
>            Priority: Minor
>             Fix For: 1.1
>
>
> The binomial coefficient classes can be improved by several small changes.
> h2. Avoid recursion
> All classes use a recursive method call to compute either C(n, k) or C(n, n - 
> k) depending which is smaller. The methods can avoid recursion using m = 
> min(k, n - k) and computing C(n, m).
> h2. Threshold for overflow
> The LogBinomialCoeffiecient knows that C(1030, 515) overflows a double. But 
> the BinomialCoefficientDouble does not. It will compute C(100000, 50000) with 
> a loop of 50000 computations and return infinity. The class should be updated 
> with some better thresholds:
> {noformat}
> C(1030, 515)           ~ 2.85e308   =>  m >= 515  is overflow
> C(2147483647, 37) * 37 ~ 5.13e303   =>  m < 38    cannot overflow
> {noformat}
> h2. Overflow protection
> Better overflow protection is required for large results. These all currently 
> fail as they overflow to infinity in the intermediate computation:
> {code:java}
>     @ParameterizedTest
>     @CsvSource({
>         "1040, 450, 2.3101613255412135615e307",
>         "1029, 514, 1.4298206864989040819e308",
>         "1786388282, 38, 7.187239013254065384599502085053593e306",
>         "1914878305, 38, 100.6570419073661447979173868523364e306",
>         "1179067476, 39, 30.22890249420109200962786203300876e306",
>         "2147483647, 37, 1.388890512412231479281222156415993e302",
>     })
>     void testBinomialCoefficientLargeK(int n, int k, double nCk) {
>         final double x = BinomialCoefficientDouble.value(n, k);
>         Assertions.assertEquals(nCk, x, Math.ulp(nCk) * 10,
>             () -> "ULP error: " + (x - nCk) / Math.ulp(nCk));
>     }
> {code}
> A test for overflow in the loop can rearrange the terms to compute the 
> correct result.
> h2. Use the known factorials
> The BinomialCoefficientDouble class can exploit the known factorials stored 
> in the Factorial class and avoid a loop computation for small n:
> {code:java}
> if (n <= 170) {
>     // Use fast table lookup:
>     result = Factorial.doubleValue(n);
>     // Smaller m will have a more accurate factorial and may be exact
>     result /= Factorial.doubleValue(m);
>     result /= Factorial.doubleValue(n - m);
> }{code}
> The result appears to be as accurate as the current implementation on a 
> partial test dataset. The complete dataset of factorials will require 
> approximately (sum i=1 ... 170/2) * 2 == 7310 values so could be tested to 
> see if this fast method is robust.
> h2. Avoid cancellation in summations
> The LogBinomialCoefficient sum will suffer from cancellation:
> {code:java}
>         // Sum for values that could overflow.
>         double logSum = 0;
>         // n! / (n - k)!
>         for (int i = n - k + 1; i <= n; i++) {
>             logSum += Math.log(i);
>         }
>         // Divide by k!
>         for (int i = 2; i <= k; i++) {
>             logSum -= Math.log(i);
>         }
>         return logSum;
> {code}
> The divide by k! should be summed separately and subtracted as a single sum. 
> This could be performed using the LogFactorial class which can cache results 
> for reuse.
> h2. Use the gamma functions
> Note: The combinatorics package depends on the gamma package. The factorial 
> can be computed using the gamma function:
> {noformat}
> n! = gamma(n+1)
> {noformat}
> This is used in the LogFactorial class which delegates to LogGamma.
> The binomial coefficient can be computed using the beta function:
> {noformat}
>    n!                gamma(n+1)               gamma(k+1) * gamma(n-k+1)
> ---------   = ------------------------- = 1 / -------------------------
> k! (n-k)!     gamma(k+1) * gamma(n-k+1)              gamma(n+1)
>             = 1 / (k * beta(k, n-k+1)) = 1 / ((n-k) * beta(k+1, n-k))
> beta(a, b) = gamma(a) * gamma(b) / gamma(a+b)
> {noformat}
> Note: In converting the Boost beta function implementation (NUMBERS-181) I 
> have tested using the beta function to compute the binomial coefficient. It 
> is not more accurate than using the current implementation within a loop to 
> compute the multiplication of terms.
> I have not tested using LogBeta to compute the LogBinomialCoefficient. The 
> accuracy of this should be checked against using the current summation of 
> logs within a loop.
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to