[
https://issues.apache.org/jira/browse/STATISTICS-36?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17460703#comment-17460703
]
Alex Herbert edited comment on STATISTICS-36 at 12/16/21, 1:26 PM:
-------------------------------------------------------------------
Each distribution using the default inverse probability method has at least one
test case where the overall test tolerance is limited by the inverse
probability test.
By default all the tests for a distribution should pass at a relative error of
1e-12. The problem cases are listed below:
||Distribution||Params||Value||Relative Error||Absolute Error||
|Beta|alpha, beta|0.1, 4|5e-8| |
| | |1, 4|5e-8| |
| | |4, 0.1|1e-7| |
| | |4, 1|1e-7| |
|Chi Squared|degrees of freedom|5|1e-9| |
| | |0.1|Test disabled| |
| | |2|1.5e-8| |
| | |1|5e-7|2e-10|
|F|numerator degrees of freedom, denominator degrees of freedom|5, 6|1e-7| |
| | |1, 2|1e-9| |
| | |2, 1|1e-8| |
| | |5, 2|1e-7| |
| | |10, 1|1e-7| |
| | |100, 100|5e-9| |
|Gamma|Shape, Scale|4, 2|5e-11| |
|Nakagami|Shape, Scale|1/3, 1|1e-7| |
|T|Degrees of freedom|5|1e-8| |
| | |2|5e-11| |
The chi-squared distribution with very small degrees of freedom has CDF( x )
values of 1.17e-60 for p=0.01, 1.17e-40 for p=0.01. This distribution totally
fails on the inverse CDF, inverse SF and the random sampling test. Note that
the random sampler is OK (verified by supplying the quantiles manually) but the
test requires the quantiles are computed by the distribution and it cannot
invert p=0.25, 0.5 or 0.75.
I have made some modifications to the default inverse method to support
adjusting the tolerances for the BrentSolver to the minimum allowed values. If
the values are updated to the following then all the tests pass at their
default test tolerance (1e-12 relative error).
{code:java}
/** BrentSolver relative accuracy. Currently 1e-14. Change to 1.11e-16. */
private static final double SOLVER_RELATIVE_ACCURACY = 0x1.0p-53;
/** BrentSolver absolute accuracy. Currently 1e-9. */
private static final double SOLVER_ABSOLUTE_ACCURACY = 0;
/** BrentSolver function value accuracy. Currently 1e-15. */
private static final double SOLVER_FUNCTION_VALUE_ACCURACY = Double.MIN_VALUE;
{code}
Notes:
* The Brent solver reduces the bracket [a, b] surrounding a root and returns
the value closest to the root (i.e. the function is ~ zero at this point)
* Relative accuracy (eps) and absolute accuracy (abs) will stop the Brent
solver reducing the interval when the distance between the two bracket values
(a and b) is small.
* The actual tolerance is a combination of the two tolerances: tol = 2 * eps *
|x| + abs; x = current min f( x ) of (a, b). So if both tolerances are small
they can combine to be twice as big. To set very small values for the tolerance
ideally requires setting one of them to zero.
* Function value accuracy will return the initial bracket value a or b if the
function at this point is within the function value accuracy for zero. Setting
this to a very low value forces the Brent solver to perform a search. This
accuracy is not used during the search, only to skip performing a search.
These threshold changes fix the broken chi-squared distribution. This is due to
the absolute accuracy being too high. If the expected x values from inverting
the CDF are around 1e-60 then an absolute tolerance of 1e-9 is not suitable.
Note that the current test suite calls inverse CDF and inverse SF using
Double.MIN_VALUE. This identified a bug in the inverse probability function for
a parameterisation of the F distribution:
{noformat}
F(1, 2)
sf(infinity) = 0.0
sf(max_value) = 5.56e-309{noformat}
Calling with min_value requires an x value outside the finite range of a
double. This raised a bracketing error in the Brent solver. This has been fixed
and the implementation will now return the correct bound. Once fixed the test
suite runs using the updated threshold is a similar time. So no extremely time
consuming searches are made.
To assess the performance impact a JMH test should be created that calls the
inverse probability function with random p values in [0, 1] with various
parameterisations of the distributions and different values for the Brent
solver relative accuracy. I suggest setting the absolute accuracy and function
accuracy to zero. Thus the inverse should get close to the value X irrespective
of the magnitude of X.
was (Author: alexherbert):
Each distribution using the default inverse probability method has at least one
test case where the overall test tolerance is limited by the inverse
probability test.
By default all the tests for a distribution should pass at a relative error of
1e-12. The problem cases are listed below:
||Distribution||Params||Value||Relative Error||Absolute Error||
|Beta|alpha, beta|0.1, 4|5e-8| |
| | |1, 4|5e-8| |
| | |4, 0.1|1e-7| |
| | |4, 1|1e-7| |
|Chi Squared|degrees of freedom|5|1e-9| |
| | |0.1|Test disabled| |
| | |2|1.5e-8| |
| | |1|5e-7|2e-10|
|F|numerator degrees of freedom, denominator degrees of freedom|5, 6|1e-7| |
| | |1, 2|1e-9| |
| | |2, 1|1e-8| |
| | |5, 2|1e-7| |
| | |10, 1|1e-7| |
| | |100, 100|5e-9| |
|Gamma|Shape, Scale|4, 2|5e-11| |
|Nakagami|Shape, Scale|1/3, 1|1e-7| |
|T|Degrees of freedom|5|1e-8| |
| | |2|5e-11| |
The chi-squared distribution with very small degrees of freedom has CDF(x)
values of 1.17e-60 for p=0.01, 1.17e-40 for p=0.01. This distribution totally
fails on the inverse CDF, inverse SF and the random sampling test. Note that
the random sampler is OK (verified by supplying the quantiles manually) but the
test requires the quantiles are computed by the distribution and it cannot
invert p=0.25, 0.5 or 0.75.
I have made some modifications to the default inverse method to support
adjusting the tolerances for the BrentSolver to the minimum allowed values. If
the values are updated to the following then all the tests pass at their
default test tolerance (1e-12 relative error).
{code:java}
/** BrentSolver relative accuracy. Currently 1e-14. Change to 1.11e-16. */
private static final double SOLVER_RELATIVE_ACCURACY = 0x1.0p-53;
/** BrentSolver absolute accuracy. Currently 1e-9. */
private static final double SOLVER_ABSOLUTE_ACCURACY = 0;
/** BrentSolver function value accuracy. Currently 1e-15. */
private static final double SOLVER_FUNCTION_VALUE_ACCURACY = Double.MIN_VALUE;
{code}
Notes:
* The Brent solver reduces the bracket [a, b] surrounding a root and returns
the value closest to the root (i.e. the function is ~ zero at this point)
* Relative accuracy (eps) and absolute accuracy (abs) will stop the Brent
solver reducing the interval when the distance between the two bracket values
(a and b) is small.
* The actual tolerance is a combination of the two tolerances: tol = 2 * eps *
|x| + abs; x = current min f(x) of (a, b). So if both tolerances are small they
can combine to be twice as big. To set very small values for the tolerance
ideally requires setting one of them to zero.
* Function value accuracy will return the initial bracket value a or b if the
function at this point is within the function value accuracy for zero. Setting
this to a very low value forces the Brent solver to perform a search. This
accuracy is not used during the search, only to skip performing a search.
These threshold changes fix the broken chi-squared distribution. This is due to
the absolute accuracy being too high. If the expected x values from inverting
the CDF are around 1e-60 then an absolute tolerance of 1e-9 is not suitable.
Note that the current test suite calls inverse CDF and inverse SF using
Double.MIN_VALUE. This identified a bug in the inverse probability function for
a parameterisation of the F distribution:
{noformat}
F(1, 2)
sf(infinity) = 0.0
sf(max_value) = 5.56e-309{noformat}
Calling with min_value requires an x value outside the finite range of a
double. This raised a bracketing error in the Brent solver. This has been fixed
and the implementation will now return the correct bound. Once fixed the test
suite runs using the updated threshold is a similar time. So no extremely time
consuming searches are made.
To assess the performance impact a JMH test should be created that calls the
inverse probability function with random p values in [0, 1] with various
parameterisations of the distributions and different values for the Brent
solver relative accuracy. I suggest setting the absolute accuracy and function
accuracy to zero. Thus the inverse should get close to the value X irrespective
of the magnitude of X.
> Optimise the default inverse CDF function for continuous distributions
> ----------------------------------------------------------------------
>
> Key: STATISTICS-36
> URL: https://issues.apache.org/jira/browse/STATISTICS-36
> Project: Apache Commons Statistics
> Issue Type: Improvement
> Components: distribution
> Affects Versions: 1.0
> Reporter: Alex Herbert
> Priority: Minor
>
> Many distributions only compute the inverse CDF to 1e-9 relative error. Some
> fail to compute when the input p value maps to a very small x value (e.g.
> 1e-50). This is related to the defaults for the brent solver used in the
> default implementation.
> Investigation of tolerances in the brent solver discovered a bug when
> evaluating very small objective functions ([NUMBERS-168]).
> The brent solver requires a release to be used with this fix so work can use
> a local copy until all issues with it have been resolved, and the changes
> ported back to numbers.
>
--
This message was sent by Atlassian Jira
(v8.20.1#820001)