[jira] [Comment Edited] (STATISTICS-36) Optimise the default inverse CDF function for continuous distributions

Alex Herbert (Jira) Thu, 16 Dec 2021 05:27:08 -0800


    [ 
https://issues.apache.org/jira/browse/STATISTICS-36?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17460703#comment-17460703
 ]


Alex Herbert edited comment on STATISTICS-36 at 12/16/21, 1:26 PM:
-------------------------------------------------------------------

Each distribution using the default inverse probability method has at least one 
test case where the overall test tolerance is limited by the inverse 
probability test.

By default all the tests for a distribution should pass at a relative error of 
1e-12. The problem cases are listed below:

 
||Distribution||Params||Value||Relative Error||Absolute Error||
|Beta|alpha, beta|0.1, 4|5e-8| |
| | |1, 4|5e-8| |
| | |4, 0.1|1e-7| |
| | |4, 1|1e-7| |
|Chi Squared|degrees of freedom|5|1e-9| |
| | |0.1|Test disabled| |
| | |2|1.5e-8| |
| | |1|5e-7|2e-10|
|F|numerator degrees of freedom, denominator degrees of freedom|5, 6|1e-7| |
| | |1, 2|1e-9| |
| | |2, 1|1e-8| |
| | |5, 2|1e-7| |
| | |10, 1|1e-7| |
| | |100, 100|5e-9| |
|Gamma|Shape, Scale|4, 2|5e-11| |
|Nakagami|Shape, Scale|1/3, 1|1e-7| |
|T|Degrees of freedom|5|1e-8| |
| | |2|5e-11| |

The chi-squared distribution with very small degrees of freedom has CDF( x ) 
values of 1.17e-60 for p=0.01, 1.17e-40 for p=0.01. This distribution totally 
fails on the inverse CDF, inverse SF and the random sampling test. Note that 
the random sampler is OK (verified by supplying the quantiles manually) but the 
test requires the quantiles are computed by the distribution and it cannot 
invert p=0.25, 0.5 or 0.75.

 

I have made some modifications to the default inverse method to support 
adjusting the tolerances for the BrentSolver to the minimum allowed values. If 
the values are updated to the following then all the tests pass at their 
default test tolerance (1e-12 relative error). 

 
{code:java}
/** BrentSolver relative accuracy. Currently 1e-14. Change to 1.11e-16. */
private static final double SOLVER_RELATIVE_ACCURACY = 0x1.0p-53;
/** BrentSolver absolute accuracy. Currently 1e-9. */
private static final double SOLVER_ABSOLUTE_ACCURACY = 0;
/** BrentSolver function value accuracy. Currently 1e-15. */
private static final double SOLVER_FUNCTION_VALUE_ACCURACY = Double.MIN_VALUE; 
{code}
 

Notes:
 * The Brent solver reduces the bracket [a, b] surrounding a root and returns 
the value closest to the root (i.e. the function is ~ zero at this point)
 * Relative accuracy (eps) and absolute accuracy (abs) will stop the Brent 
solver reducing the interval when the distance between the two bracket values 
(a and b) is small. 
 * The actual tolerance is a combination of the two tolerances: tol = 2 * eps * 
|x| + abs; x = current min f( x ) of (a, b). So if both tolerances are small 
they can combine to be twice as big. To set very small values for the tolerance 
ideally requires setting one of them to zero.
 * Function value accuracy will return the initial bracket value a or b if the 
function at this point is within the function value accuracy for zero. Setting 
this to a very low value forces the Brent solver to perform a search. This 
accuracy is not used during the search, only to skip performing a search.

These threshold changes fix the broken chi-squared distribution. This is due to 
the absolute accuracy being too high. If the expected x values from inverting 
the CDF are around 1e-60 then an absolute tolerance of 1e-9 is not suitable.

Note that the current test suite calls inverse CDF and inverse SF using 
Double.MIN_VALUE. This identified a bug in the inverse probability function for 
a parameterisation of the F distribution:
{noformat}
F(1, 2)
sf(infinity) = 0.0
sf(max_value) = 5.56e-309{noformat}
Calling with min_value requires an x value outside the finite range of a 
double. This raised a bracketing error in the Brent solver. This has been fixed 
and the implementation will now return the correct bound. Once fixed the test 
suite runs using the updated threshold is a similar time. So no extremely time 
consuming searches are made.

To assess the performance impact a JMH test should be created that calls the 
inverse probability function with random p values in [0, 1] with various 
parameterisations of the distributions and different values for the Brent 
solver relative accuracy. I suggest setting the absolute accuracy and function 
accuracy to zero. Thus the inverse should get close to the value X irrespective 
of the magnitude of X.

 


was (Author: alexherbert):
Each distribution using the default inverse probability method has at least one 
test case where the overall test tolerance is limited by the inverse 
probability test.

By default all the tests for a distribution should pass at a relative error of 
1e-12. The problem cases are listed below:

 
||Distribution||Params||Value||Relative Error||Absolute Error||
|Beta|alpha, beta|0.1, 4|5e-8| |
| | |1, 4|5e-8| |
| | |4, 0.1|1e-7| |
| | |4, 1|1e-7| |
|Chi Squared|degrees of freedom|5|1e-9| |
| | |0.1|Test disabled| |
| | |2|1.5e-8| |
| | |1|5e-7|2e-10|
|F|numerator degrees of freedom, denominator degrees of freedom|5, 6|1e-7| |
| | |1, 2|1e-9| |
| | |2, 1|1e-8| |
| | |5, 2|1e-7| |
| | |10, 1|1e-7| |
| | |100, 100|5e-9| |
|Gamma|Shape, Scale|4, 2|5e-11| |
|Nakagami|Shape, Scale|1/3, 1|1e-7| |
|T|Degrees of freedom|5|1e-8| |
| | |2|5e-11| |

The chi-squared distribution with very small degrees of freedom has CDF(x) 
values of 1.17e-60 for p=0.01, 1.17e-40 for p=0.01. This distribution totally 
fails on the inverse CDF, inverse SF and the random sampling test. Note that 
the random sampler is OK (verified by supplying the quantiles manually) but the 
test requires the quantiles are computed by the distribution and it cannot 
invert p=0.25, 0.5 or 0.75.

 

I have made some modifications to the default inverse method to support 
adjusting the tolerances for the BrentSolver to the minimum allowed values. If 
the values are updated to the following then all the tests pass at their 
default test tolerance (1e-12 relative error). 

 
{code:java}
/** BrentSolver relative accuracy. Currently 1e-14. Change to 1.11e-16. */
private static final double SOLVER_RELATIVE_ACCURACY = 0x1.0p-53;
/** BrentSolver absolute accuracy. Currently 1e-9. */
private static final double SOLVER_ABSOLUTE_ACCURACY = 0;
/** BrentSolver function value accuracy. Currently 1e-15. */
private static final double SOLVER_FUNCTION_VALUE_ACCURACY = Double.MIN_VALUE; 
{code}
 

Notes:
 * The Brent solver reduces the bracket [a, b] surrounding a root and returns 
the value closest to the root (i.e. the function is ~ zero at this point)
 * Relative accuracy (eps) and absolute accuracy (abs) will stop the Brent 
solver reducing the interval when the distance between the two bracket values 
(a and b) is small. 
 * The actual tolerance is a combination of the two tolerances: tol = 2 * eps * 
|x| + abs; x = current min f(x) of (a, b). So if both tolerances are small they 
can combine to be twice as big. To set very small values for the tolerance 
ideally requires setting one of them to zero.
 * Function value accuracy will return the initial bracket value a or b if the 
function at this point is within the function value accuracy for zero. Setting 
this to a very low value forces the Brent solver to perform a search. This 
accuracy is not used during the search, only to skip performing a search.

These threshold changes fix the broken chi-squared distribution. This is due to 
the absolute accuracy being too high. If the expected x values from inverting 
the CDF are around 1e-60 then an absolute tolerance of 1e-9 is not suitable.

Note that the current test suite calls inverse CDF and inverse SF using 
Double.MIN_VALUE. This identified a bug in the inverse probability function for 
a parameterisation of the F distribution:
{noformat}
F(1, 2)
sf(infinity) = 0.0
sf(max_value) = 5.56e-309{noformat}
Calling with min_value requires an x value outside the finite range of a 
double. This raised a bracketing error in the Brent solver. This has been fixed 
and the implementation will now return the correct bound. Once fixed the test 
suite runs using the updated threshold is a similar time. So no extremely time 
consuming searches are made.

To assess the performance impact a JMH test should be created that calls the 
inverse probability function with random p values in [0, 1] with various 
parameterisations of the distributions and different values for the Brent 
solver relative accuracy. I suggest setting the absolute accuracy and function 
accuracy to zero. Thus the inverse should get close to the value X irrespective 
of the magnitude of X.

 

> Optimise the default inverse CDF function for continuous distributions
> ----------------------------------------------------------------------
>
>                 Key: STATISTICS-36
>                 URL: https://issues.apache.org/jira/browse/STATISTICS-36
>             Project: Apache Commons Statistics
>          Issue Type: Improvement
>          Components: distribution
>    Affects Versions: 1.0
>            Reporter: Alex Herbert
>            Priority: Minor
>
> Many distributions only compute the inverse CDF to 1e-9 relative error. Some 
> fail to compute when the input p value maps to a very small x value (e.g. 
> 1e-50). This is related to the defaults for the brent solver used in the 
> default implementation.
> Investigation of tolerances in the brent solver discovered a bug when 
> evaluating very small objective functions ([NUMBERS-168]).
> The brent solver requires a release to be used with this fix so work can use 
> a local copy until all issues with it have been resolved, and the changes 
> ported back to numbers.
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Comment Edited] (STATISTICS-36) Optimise the default inverse CDF function for continuous distributions

Reply via email to