Alex Herbert created STATISTICS-70:
--------------------------------------

             Summary: Improve the CDF of the Hypergeometric distribution
                 Key: STATISTICS-70
                 URL: https://issues.apache.org/jira/browse/STATISTICS-70
             Project: Commons Statistics
          Issue Type: Improvement
          Components: distribution
    Affects Versions: 1.0
            Reporter: Alex Herbert


The hypergeometric distribution computes the CDF and the survival function (SF) 
using a summation of the PDF. This can be improved by caching a midpoint and 
only summing a choice of the lower or upper section. The complement can be used 
to compute the function in the other domain, e.g: CDF = 1 - SF.

Other functions can also exploit this summation:
 * The probability(x0, x1) function can be performed using a summation of the 
range (x0, x1]. Currently it uses the default implementation which is CDF(x1) - 
CDF(x0). This will duplicate part of the summation of the range (i.e. up to x0).
 * The inverse CDF and inverse SF use the default implementation of a bracketed 
bisection search of the CDF or SF. This can be updated to simply sum the PDF 
until the target CDF / SF is obtained. This effectively changes the function to 
a single call to the smaller of CDF or SF to find the target quantile.

The midpoint could be the median (CDF ~ SF ~ 0.5) which requires computation, 
or the mode which is floor((n+1)(K+1)/(N+2)). From a look at example density 
functions the two values should be similar (see [Hypergeometric distribution 
(Wikipedia)|https://en.wikipedia.org/wiki/Hypergeometric_distribution]). 
However to ensure strict inversion the p-value would also be required for the 
midpoint so the inverse implementation can correctly switch the choice of which 
function to invert.

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to