Alex Herbert created STATISTICS-70:
--------------------------------------
Summary: Improve the CDF of the Hypergeometric distribution
Key: STATISTICS-70
URL: https://issues.apache.org/jira/browse/STATISTICS-70
Project: Commons Statistics
Issue Type: Improvement
Components: distribution
Affects Versions: 1.0
Reporter: Alex Herbert
The hypergeometric distribution computes the CDF and the survival function (SF)
using a summation of the PDF. This can be improved by caching a midpoint and
only summing a choice of the lower or upper section. The complement can be used
to compute the function in the other domain, e.g: CDF = 1 - SF.
Other functions can also exploit this summation:
* The probability(x0, x1) function can be performed using a summation of the
range (x0, x1]. Currently it uses the default implementation which is CDF(x1) -
CDF(x0). This will duplicate part of the summation of the range (i.e. up to x0).
* The inverse CDF and inverse SF use the default implementation of a bracketed
bisection search of the CDF or SF. This can be updated to simply sum the PDF
until the target CDF / SF is obtained. This effectively changes the function to
a single call to the smaller of CDF or SF to find the target quantile.
The midpoint could be the median (CDF ~ SF ~ 0.5) which requires computation,
or the mode which is floor((n+1)(K+1)/(N+2)). From a look at example density
functions the two values should be similar (see [Hypergeometric distribution
(Wikipedia)|https://en.wikipedia.org/wiki/Hypergeometric_distribution]).
However to ensure strict inversion the p-value would also be required for the
midpoint so the inverse implementation can correctly switch the choice of which
function to invert.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)