[jira] [Commented] (STATISTICS-52) High precision PDF for the Normal distribution

Alex Herbert (Jira) Fri, 21 Jan 2022 05:41:07 -0800


    [ 
https://issues.apache.org/jira/browse/STATISTICS-52?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17480077#comment-17480077
 ]


Alex Herbert commented on STATISTICS-52:
----------------------------------------

The previous JMH benchmark only tested the call to the exponential function. 
However the normal distribution PDF requires more computations:
{code:java}
// provided
double mean, sd;
// Precomputed
double sdSqrt2pi = sd * Math.sqrt(2 * Math.PI);

double density(double x) {
    final double z = (x - mean) / sd;
    return Math.exp(-0.5 * z * z) / sdSqrt2pi;
}{code}
Thus the density computation must first normalise the input value x, compute 
the exponential and perform a divide. Even if the mean and standard are 0 and 1 
the divide uses a non-trivial value of 2.5066. Thus the benchmark has been 
updated to include the extra steps.

Note the extra columns in the results simply remove the time for generation of 
the random deviate in the baseline method:
{noformat}
baseline = generation of random X deviate
std      = standard precision PDF
hp       = high-precision PDF

Adjusted = Score[method] - Score[baseline]
Relative = Adjusted[2] / Adjusted[1]{noformat}
h2. Normally distributed X deviate
||Method||Score||Adjusted||Relative||
| baseline|8.878| | |
|      std|34.722|25.844| |
|       hp|35.897|27.019|1.045|

On this data the difference is lower (4.5%) than the previously observed 10% 
for the exp function in isolation.
h2. Uniformally distributed X deviate
||Method||Low||High||Score||Adjusted||Relative||
| baseline|0|1|8.94| | |
|      std|0|1|36.947|28.007| |
|       hp|0|1|36.858|27.918|0.997|
| baseline|0|10|8.902| | |
|      std|0|10|37.161|28.259| |
|       hp|0|10|40.043|31.141|1.102|
| baseline|0|30|8.841| | |
|      std|0|30|36.84|27.999| |
|       hp|0|30|39.684|30.843|1.102|
| baseline|0|100|8.981| | |
|      std|0|100|32.619|23.638| |
|       hp|0|100|28.142|19.161|0.811|
| baseline|2|20|9.184| | |
|      std|2|20|36.932|27.748| |
|       hp|2|20|39.526|30.342|1.093|
| baseline|0|2.83|8.894| | |
|      std|0|2.83|36.879|27.985| |
|       hp|0|2.83|41.707|32.813|1.173|

When the full PDF is computed the relative speed difference is minor compared 
to the previous benchmark of the exp function.
 * When the computation is entirely standard precision ([0, 1]) then there is 
no speed difference.
 * When it is entirely high precision ([2, 20]) then is is about 10% slower.
 * Where the function must choose between the standard precision computation 
(x^2 < 2) or high precision then it is again about 10% slower on the  [0, 10] 
and [0, 30] data.
 * In the worst case scenario of [0, 2.83] the random deviate value x^2 will be 
< 2 approximately 50% of the time. This is 17% slower.
 * On the [0, 100] data the method is faster; in this data about 60% of the 
time the computation will not call Math.exp.

h2. Conclusion

Given that the other computation for the normal distribution CDF also uses a 
high precision exp function within the error function (erf) to increase 
accuracy it would be consistent to add the high precision PDF. The overall 
speed impact is minor at around 5% slower on normally distributed X data and 
17% slower on worse case uniformly distributed input data.

 

> High precision PDF for the Normal distribution
> ----------------------------------------------
>
>                 Key: STATISTICS-52
>                 URL: https://issues.apache.org/jira/browse/STATISTICS-52
>             Project: Apache Commons Statistics
>          Issue Type: Improvement
>          Components: distribution
>    Affects Versions: 1.0
>            Reporter: Alex Herbert
>            Priority: Minor
>
> The normal distribution PDF is computed using:
>  
> {code:java}
> Math.exp(-0.5 * x * x) / Math.sqrt(2 * Math.PI)
> {code}
> The value {{x^2}} can be computed to extended precision. This extra 
> information in the round-off bits can increase the accuracy of the 
> exponential function (see NUMBERS-177 under the title 'Accurate scaling by 
> exp(z*z)').
>  
> The effect of including the round-off bits on both accuracy and speed should 
> be investigated.
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Commented] (STATISTICS-52) High precision PDF for the Normal distribution

Reply via email to