[ https://issues.apache.org/jira/browse/NUMBERS-183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Alex Herbert resolved NUMBERS-183. ---------------------------------- Resolution: Implemented Binomial coefficient classes update in commit: d22325416794452dcf19d998374b4dd1519523bf > Improve binomial coefficient classes > ------------------------------------ > > Key: NUMBERS-183 > URL: https://issues.apache.org/jira/browse/NUMBERS-183 > Project: Commons Numbers > Issue Type: Improvement > Components: combinatorics > Affects Versions: 1.0 > Reporter: Alex Herbert > Priority: Minor > Fix For: 1.1 > > > The binomial coefficient classes can be improved by several small changes. > h2. Avoid recursion > All classes use a recursive method call to compute either C(n, k) or C(n, n - > k) depending which is smaller. The methods can avoid recursion using m = > min(k, n - k) and computing C(n, m). > h2. Threshold for overflow > The LogBinomialCoeffiecient knows that C(1030, 515) overflows a double. But > the BinomialCoefficientDouble does not. It will compute C(100000, 50000) with > a loop of 50000 computations and return infinity. The class should be updated > with some better thresholds: > {noformat} > C(1030, 515) ~ 2.85e308 => m >= 515 is overflow > C(2147483647, 37) * 37 ~ 5.13e303 => m < 38 cannot overflow > {noformat} > h2. Overflow protection > Better overflow protection is required for large results. These all currently > fail as they overflow to infinity in the intermediate computation: > {code:java} > @ParameterizedTest > @CsvSource({ > "1040, 450, 2.3101613255412135615e307", > "1029, 514, 1.4298206864989040819e308", > "1786388282, 38, 7.187239013254065384599502085053593e306", > "1914878305, 38, 100.6570419073661447979173868523364e306", > "1179067476, 39, 30.22890249420109200962786203300876e306", > "2147483647, 37, 1.388890512412231479281222156415993e302", > }) > void testBinomialCoefficientLargeK(int n, int k, double nCk) { > final double x = BinomialCoefficientDouble.value(n, k); > Assertions.assertEquals(nCk, x, Math.ulp(nCk) * 10, > () -> "ULP error: " + (x - nCk) / Math.ulp(nCk)); > } > {code} > A test for overflow in the loop can rearrange the terms to compute the > correct result. > h2. Use the known factorials > The BinomialCoefficientDouble class can exploit the known factorials stored > in the Factorial class and avoid a loop computation for small n: > {code:java} > if (n <= 170) { > // Use fast table lookup: > result = Factorial.doubleValue(n); > // Smaller m will have a more accurate factorial and may be exact > result /= Factorial.doubleValue(m); > result /= Factorial.doubleValue(n - m); > }{code} > The result appears to be as accurate as the current implementation on a > partial test dataset. The complete dataset of factorials will require > approximately (sum i=1 ... 170/2) * 2 == 7310 values so could be tested to > see if this fast method is robust. > h2. Avoid cancellation in summations > The LogBinomialCoefficient sum will suffer from cancellation: > {code:java} > // Sum for values that could overflow. > double logSum = 0; > // n! / (n - k)! > for (int i = n - k + 1; i <= n; i++) { > logSum += Math.log(i); > } > // Divide by k! > for (int i = 2; i <= k; i++) { > logSum -= Math.log(i); > } > return logSum; > {code} > The divide by k! should be summed separately and subtracted as a single sum. > This could be performed using the LogFactorial class which can cache results > for reuse. > h2. Use the gamma functions > Note: The combinatorics package depends on the gamma package. The factorial > can be computed using the gamma function: > {noformat} > n! = gamma(n+1) > {noformat} > This is used in the LogFactorial class which delegates to LogGamma. > The binomial coefficient can be computed using the beta function: > {noformat} > n! gamma(n+1) gamma(k+1) * gamma(n-k+1) > --------- = ------------------------- = 1 / ------------------------- > k! (n-k)! gamma(k+1) * gamma(n-k+1) gamma(n+1) > = 1 / (k * beta(k, n-k+1)) = 1 / ((n-k) * beta(k+1, n-k)) > beta(a, b) = gamma(a) * gamma(b) / gamma(a+b) > {noformat} > Note: In converting the Boost beta function implementation (NUMBERS-181) I > have tested using the beta function to compute the binomial coefficient. It > is not more accurate than using the current implementation within a loop to > compute the multiplication of terms. > I have not tested using LogBeta to compute the LogBinomialCoefficient. The > accuracy of this should be checked against using the current summation of > logs within a loop. > -- This message was sent by Atlassian Jira (v8.20.1#820001)