Cumulative probability and inverse cumulative probability inconsistencies
-------------------------------------------------------------------------

                 Key: MATH-692
                 URL: https://issues.apache.org/jira/browse/MATH-692
             Project: Commons Math
          Issue Type: Bug
    Affects Versions: 2.2, 2.1, 2.0, 1.2, 1.1, 1.0, 1.3, 2.2.1, 3.0
            Reporter: Christian Winter
             Fix For: 3.0


There are some inconsistencies in the documentation and implementation of 
functions regarding cumulative probabilities and inverse cumulative 
probabilities. More precisely, '<' and '<=' are not used in a consistent way.

Besides I would move the function inverseCumulativeProbability(double) to the 
interface Distribution. A true inverse of the distribution function does 
neither exist for Distribution nor for ContinuosDistribution. Thus we need to 
define the inverse in terms of quantiles anyway, and this can already be done 
for Distribution.

On the whole I would declare the (inverse) cumulative probability functions in 
the basic distribution interfaces as follows:

Distribution:
- cumulativeProbability(double x): returns P(X <= x)
- cumulativeProbability(double x0, double x1): returns P(x0 < X <= x1) [see 
also 1)]
- inverseCumulativeProbability(double p):
  returns the quantile function inf{x in R | P(X<=x) >= p} [see also 2), 3), 
and 4)]

1) An aternative definition could be P(x0 <= X <= x1). But this requires to put 
the function probability(double x) or another cumulative probability function 
into the interface Distribution in order be able to calculate P(x0 <= X <= x1) 
in AbstractDistribution.
2) This definition is stricter than the definition in ContinuousDistribution, 
because the definition there does not specify what to do if there are multiple 
x satisfying P(X<=x) = p.
3) A modification could be defined for p=0: Returning sup{x in R | P(X<=x) = 0} 
would yield the infimum of the distribution's support instead of a mandatory 
-infinity.
4) This affects issue MATH-540. I'd prefere the definition from above for the 
following reasons:
- This definition simplifies inverse transform sampling (as mentioned in the 
other issue).
- It is the standard textbook definition for the quantile function.
- For integer distributions it has the advantage that the result doesn't change 
when switching to "x in Z", i.e. the result is independent of considering the 
intergers as sole set or as part of the reals.

ContinuousDistribution:
nothing to be added regarding (inverse) cumulative probability functions

IntegerDistribution:
- cumulativeProbability(int x): returns P(X <= x)
- cumulativeProbability(int x0, int x1): returns P(x0 < X <= x1) [see also 1) 
above]

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to