[ 
https://issues.apache.org/jira/browse/NUMBERS-177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17454141#comment-17454141
 ] 

Alex Herbert commented on NUMBERS-177:
--------------------------------------

I have investigated various ways to compute the scaled complementary error 
function. Reference values were computed using the Boost erfc function 
evaluated at 128-bit precision when z was within the range of that function 
(z<100). The function was updated to remove scaling down by exp(-z*z) to output 
the rational function approximation for erfcx(z). Above this value erfcx was 
evaluated using the continued fraction for the upper incomplete gamma function 
(see [Erfc eq 3|https://mathworld.wolfram.com/Erfc.html]):
{noformat}
erfcx(z) = exp(z*z) * erfc(z)

erfc(z) = Gamma(0.5, z*z) / sqrt(pi)
{noformat}
The current rational function approximation for erfc is valid for the range [4, 
28]. Above this value the approximation has increasingly large errors.

I tried a few variations of different continued fractions for erfc. The best of 
these was the same upper incomplete gamma function evaluated using 64-bit 
precision. However this continued fraction requires a variable number of 
iterations to converge (up to 8 for z ~ 28). Attempts to explicitly evaluate 
the nth convergent with various number of terms did not provide good error 
across the entire range for z [28, 6.71e7].

This paper presents a rational function that is suitable for use up to the 
limit:
{noformat}
"Rational Chebyshev approximations for the error function"
by W. J. Cody, Math. Comp., 1969, PP. 631-638.{noformat}
There is a FORTRAN implementation in [SpecFun 
erf|https://www.netlib.org/specfun/erf]. I obtained the paper which has the 
constants valid for evaluation up to long double precision which was useful for 
testing.

I tested the various implementations using random input arguments in the 
applicable range. The values were sampled uniformly from the range using the 
IEEE 754 64-bit representation of the min and max to sample long values within 
the range and then mapped back to a double. Due to the binary representation of 
a floating-point number using base 2 this has a limiting distribution of a 
log-uniform distribution. When plotted using a log 2 scale the points are 
approximately uniform.

The range was split into bins and the maximum and RMS ulp error recorded for 
each bin. The following results all used 1e9 points split into 10000 bins.
h2. Erfcx

I have tested the rational function provided by Boost and Cody over the 
applicable range. It is important to note that the range is split and evaluated 
with different rational functions:
||Implementation||Range||
|Boost|[0.5, 1.5], [1.5, 2.5], [2.5, 4.5], [4.5, 28]|
|Cody|[0.5, 4], [4, 6.71e7]|

Here is the max error over the range covered by erfc:

!erfcx.medium.png!

Note:
 * The Boost function has clear evidence of the minimax approximation with the 
wave form of the error.
 * The Cody function is more consistent at z > 4 but the single Cody function 
for z < 4 is not as good as the Boost functions at z < 4. Clearly this region 
is difficult to evaluate with a single function. The region contains points at 
which erfcx(z) = 0.5 (z ~ 0.7689) and also where exp(-z*z) = z (z ~ 0.6529).
 * The Boost function error at z ~ 33 is starting to increase rapidly.

There are breaks in the max error line at certain points. This is most evident 
for z > 4. This occurs when the value of the function erfcx (which gets smaller 
as z increases) moves to the next power of two representation. Thus the 
precision of the expected result doubles and the max error measured in absolute 
distance is suddenly twice the error when expressed relative to the ULP of a 
double.

For reference I have used the constants provided by Cody and evaluated the 
function for z in [0.5, 4] using a long double (64-bit mantissa). The max error 
is < 0.5 ULP. Thus the function is correct but it is subject to error when 
evaluated using double precision. The function tested uses 8 terms for each 
polynomial to compute P and Q for the result P/Q. Cody provides rational 
approximations using fewer terms with a lower theoretical error bound that is 
still sufficient for double precision. I tried these and the results were worse.

At larger z the Boost function is not a good approximation. It has clearly been 
optimised for use with z < 28. Here is a plot of the max error for the Cody 
function against the continued fraction evaluation.

!erfcx.large.png!

The Cody function maintains good accuracy up to the limit of 6.71e7. It can be 
evaluated without iteration of a continued fraction algorithm so will have a 
performance gain over using a continued fraction.
h2. Effect on Erfc

The Cody function is suitable for erfcx. I applied the same test to erfc. This 
is a simple scaling by exp(-z*z).

!erfc.png!

The results are as expected. At z < 4 the Boost function works better. At z > 4 
there is little difference between the two. This suggests that the Boost 
function can be replaced with the Erfc function provided by Cody.

The error is roughly double that observed for erfcx. This is due to the 
increased precision of the result of erfc as it is smaller and has more 
representations, and also the accuracy of the scaling by exp(-x*x).

Note that although both functions have low error when z > 26.5, the Boost 
function is worse. This is the region where erfc(z) is sub-normal and exp(-z*z) 
is sub-normal. The Boost function scales by exp(-z*z) / z. The Cody functions 
computes the result of erfcx(z) using a function divided by z. It then scales 
to erfc(z) by multiplication by exp(-z*z). This means that if exp(-z*z) is 
sub-normal then it is not divided by z with a corresponding loss of a bit of 
precision.
h2. Accurate scaling by exp(z*z)

The erfc function requires scaling by exp(-z*z). The erfcx functions requires 
computation of exp(z*z) to high accuracy when z is negative to evaluate:
{code:java}
erfcx(z) = 2 * exp(z*z) - erfcx(-z)
{code}
The current implementation uses the following method to compute this to high 
accuracy:
{code:java}
exp(a+b) = exp(a) * exp(b)
{code}
The value z*z can be evaluated as a double length number with high part a and 
low part b as the round-off error.

Note that the above can be reformulated as:
{code:java}
exp(a+b) = exp(a) * (exp(b) - 1) + exp(a)
{code}
Here exp(b) - 1 can be evaluated using expm1. However note that the round off b 
for any value a that will not overflow exp is limited to 0.5 * 
ulp(sqrt(log(MAX_VALUE))) = 5.68e-14. This is small and the expansion of exp(a) 
can be used with a few terms:
{code:java}
exp(b) = 1 + b/1! + b^2/2! + b^3/3! + ...
expm1(b) ~ b + b^2/2{code}
In practice although the second term is required for the correct expm1(b) 
result for b < 5.68e-14 it does not alter the product of exp(a) by enough to 
change it. The high precision result is then:
{code:java}
exp(a+b) = exp(a) * (exp(b) - 1) + exp(a)
         = exp(a) * (b + b^2/2) + exp(a)
         = exp(a) * b + exp(a)
{code}
This has the advantage of 1 less evaluation of exp as the term exp(b) is not 
required.

Here is the same plot for the max error of various forms for exp(-(a+b)) 
evaluated at double and long double precision (this is for scaling down by 
exp(-x*x)).

!expn.png!

When a > 2 the standard result exp(a) (purple) is off the chart. At the limit 
of z around 27 the max ulp error is above 500.

The current exp(a)*exp(b) implementation (green) has a large amount of 
variation and a max error of 2. The implementation using the log1p(b) 
approximation (blue) has max 1 ulp error for most of the range.

There is a tiny part of the range where the ulp is above 1 (max measured at 
1.24 ulp). This occurs when exp(-a) is close to subnormal. The product exp(-a) 
* b is sub-normal and thus the adjustment added to exp(-a) has lost a few bits 
of precision. In this case this is not an issue. The sub-normal result exp(-a) 
will be multiplied by a number smaller than 1 (e.g. erfcx(27) = 0.0209) and the 
final result for erfc(z) is sub-normal with even less precision. 

I have shown the result for exp(-a). For exp(a) there is not a problem with 
sub-normal results and the entire range has max error of 1 ulp.

Note that for z<1 the standard precision exp result cannot be improved. This is 
because:

 
{noformat}
0 < |a| < 1;

expm1(b) ~ b;
b < 0.5 * ulp(a);
  < epsilon       (or < 2^-52)

=> b*exp(a) + exp(a) == exp(a){noformat}
If the exp function can be evaluated in long double then max error is 0.75 ulp. 
If the expm1(b) adjustment is used then the long double evaluation has 0.5 ulp 
max error.

 

This extra precision can be obtained by passing the round-off bits b into the 
exp function if supported. Commons Math's 4.0 AccurateMath provides this 
feature internally. I extracted all the code for AccurateMath.exp and passed 
the extra bits. This solved all the unit test cases where I captured a few 
example of the max ulp above 1.0. This update does not really change the 
overall accuracy of the function. It improves the result by 0.5 ulp max error 
for the erfcx(z < -0.5) dataset but the current test data has only 900 
reference points for erfcx so worse case errors of 1 in a billion may improve 
by more. The implementation just passes a and b to AccurateMath:
{code:java}
// Compute a+b
// null is an array double[] for extra bits of precision on output
double e = AccurateMath.exp(a, b, null);{code}
This is an internal API that I exposed for testing. A more sensible API would 
be:
{code:java}
// Compute exp(x*y)
AccurateMath.exp(double x, double y);{code}
AccurateMath can then compute the parts a=x*y and b=round-off(x*y) and generate 
the result using the extra bits of precision. This should create a function 
accurate to max 0.5 ulp for the entire range of finite output. Scaling would be 
done using:
{code:java}
AccurateMath.exp(-z, z);
AccurateMath.exp(z, z);{code}
h2. Conclusion

The Boost Erfc function approximation is not suitable over the entire range of 
erfcx. A rational approximation by Cody provides similar accuracy to the Boost 
erfc function in the range [4, 28] and continues to have low maximum error up 
to the limit of 6.71e7. Above this the approximation erfcx(z) = 1/sqrt(pi) / z 
can be used.

The scaling by exp(z*z) can be improved by altering the adjustment made to the 
standard precision exp result to use an approximation of expm1 of the round-off 
part  of z*z.

 

> Scaled complementary error function
> -----------------------------------
>
>                 Key: NUMBERS-177
>                 URL: https://issues.apache.org/jira/browse/NUMBERS-177
>             Project: Commons Numbers
>          Issue Type: New Feature
>          Components: gamma
>    Affects Versions: 1.1
>            Reporter: Alex Herbert
>            Priority: Minor
>             Fix For: 1.1
>
>         Attachments: erfc.png, erfcx.large.png, erfcx.medium.png, expn.png
>
>
> Add a scaled complementary error function:
> {noformat}
> erfcx(x) = exp(x^2) * erfc(x)
> {noformat}
> For small x this can use the current rational function approximations for erf 
> or erfc and remove the scaling down by exp(-x^2). For large x this can use 
> the [continued fraction expansion of 
> erfc|https://en.wikipedia.org/wiki/Error_function#Continued_fraction_expansion]
>  and remove the scaling by e^-z:
> {noformat}
>              z                    1
> erfc z = -------- e^-z^2 -----------------------------
>          sqrt(pi)        z^2 +           1/2
>                                -----------------------
>                                1 +         1
>                                    -------------------
>                                    z^2 +      3/2
>                                          -------------
>                                          1 +     2
>                                              ---------
>                                              z^2 + ...
> {noformat}
> At very large x the rational function cannot be evaluated as {{z^2 + 0.5 ~ 
> z^2}} and the following approximation can be used:
> {noformat}
>               1      1
> erfcx z = -------- * -
>           sqrt(pi)   z
> {noformat}
> This occurs at 2^26 or approximately 6.71e7.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to