[jira] [Commented] (RNG-137) Move class from "Commons Math"

Alex Herbert (Jira) Sat, 03 Jul 2021 06:07:04 -0700


    [ 
https://issues.apache.org/jira/browse/RNG-137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17374011#comment-17374011
 ]

Alex Herbert commented on RNG-137:
----------------------------------

I have re-implemented the StableSampler due to issues in the commons math
version. The version in CM did not have extensive unit tests and when I
investigated further I found the sampler was incorrect for several of the
parameter combinations (alpha=2, alpha != 1). I could not find the reference
provided in the source code for the StableRandomGenerator for the origin of the
formulas. The alpha != 1 formula does not match others I found.
h2. References

Chambers, Mallows & Stuck (1976) "A Method for Simulating Stable Random
Variables". Journal of the American Statistical Association. 71 (354): 340–344.

Weron (1996) "On the Chambers-Mallows-Stuck method for simulating skewed stable
random variables". Statistics & Probability Letters. 28 (2): 165–171.

Nolan (2020) "Univariate Stable Distributions: Models for Heavy Tailed Data".
Springer Series in Operations Research and Financial Engineering. Springer.
Sections 1.7 and 3.3.3.

Hart, JF et al (1968) Computer Approximations, New York: John Wiley & Sons, Inc.
h2. Introduction

The stable distribution is a family of distributions having an alpha (in (0,
2]) and beta (in [-1, 1]) parameter that define the shape (alpha=stability,
beta=skew) and then a scale and location parameter. Special cases include the
Gaussian (alpha=2, beta ignored), Cauchy (alpha=1, beta=0), and Levy
(alpha=0.5, beta=1). The support is in the interval (-inf, inf) unless
beta=+/-1 and alpha < 1. Here the support is to beta * tan(2*alpha/pi), and
this finite limit is upper or lower depending on the sign of beta.

The stable distribution is defined by the characteristic function which
includes a complex number component. There are several parameterisations for
the stable distribution which have different characteristic functions. These
have been formally described in several places but are summarised in Nolan. In
Nolan the scale is named gamma, and the location is named delta.

The 1-parameterisation is that computed by the CM implementation. This
parameterisation is discontinuous at alpha=1. The 0-parameterisation is
continuous at alpha=1. A 0-parameterised stable deviate can be shifted to a
1-parameterised deviate by an offset. The magnitude of the offset approaches
infinity as alpha -> 1 thus the discontinuity. For numerical work the
0-parameterisation is recommended. A standard deviate x in the
0-parameterisation with scale=1, location=0 can be shifted to any deviate by a
linear transform gamma * x + delta.

A method for computing a stable random deviate was published by Chambers et al
and is known as the Chambers-Mallows-Stuck (CMS) method. It requires generation
of a uniform deviate in (-pi/2, pi/2) and an exponential deviate with mean 1.
Two formulas were provided for alpha=1 and alpha != 1 and later a formal proof
of this was provided by Weron. These formulas output a deviate in the
1-parameterisation. A shift (zeta) can be applied to obtain the
0-parameterisation. Close to alpha=1 this shift approaches 1e16 using the JDK
math library to compute tan:
{noformat}
Zeta
Beta
Alpha 1.0 0.5 0.25 0.1 0.0
0.001 0.001571 0.0007854 0.0003927 0.0001571 0.0
0.01 0.01571 0.007855 0.003927 0.001571 0.0
0.05 0.07870 0.03935 0.01968 0.007870 0.0
0.1 0.1584 0.07919 0.03960 0.01584 0.0
0.5 1.000 0.5000 0.2500 0.1000 0.0
0.9 6.314 3.157 1.578 0.6314 0.0
0.95 12.71 6.353 3.177 1.271 0.0
0.99 63.66 31.83 15.91 6.366 0.0
0.995 127.3 63.66 31.83 12.73 0.0
0.999 636.6 318.3 159.2 63.66 0.0
0.9995 1273 636.6 318.3 127.3 0.0
0.9999 6366 3183 1592 636.6 0.0
1.0 1.633E+16 8.166E+15 4.083E+15 1.633E+15 0.0
{noformat}
Since most of the deviates when alpha=1 are in the range [-5, 5] this large
offset to create the deviate reduces the number of significant bits in the
sample due to cancellation.

However the Chambers paper provided a rearrangement of the formulas using a
re-parameterisation and half angle trigonomic identities to create a function
that is continuous as alpha -> 1. This outputs a 0-parameterisation deviate.
The paper provides FORTRAN code to compute the sample (the function RSTAB). I
refer to this as the CMS formula and the alternative as the Weron formula,
although both are implementations of the CMS method.

The CMS algorithm RSTAB uses a function tan( x) / x and calls it using a value
of x in the range (-pi/4, pi/4). This can be approximated using an expansion
summation. The approximation is from Hart. The function is named tan2 in the
Chambers paper and the RSTAB program. This should have the properties:
* For x=0 this returns 1.
* For x=pi/4 this returns 4/pi.
* For x=pi/4 this multiplied by x returns 1.

The original RSTAB program used single precision (32-bit) floats. The
approximation when written using 64-bit doubles did not satisfy the
requirements. When the requirements fail then extreme values of x create values
that result in cancellation in the formula and invalid output (i.e. NaNs). I
located the reference by Hart and used the highest precision approximation for
tan( x) that was available. This did satisfy the requirements.
h2. CMS algorithm

I implemented the CMS algorithm using the JDK Math.tan function and a tan2
approximation from Hart. This algorithm avoids many trigonomic calls to sin and
cos by using half angle identities that reduce to just 2 calls to tan. I also
implemented the variant Weron formulas.

One issue is that the CMS algorithm computes a term z that should be in the
range [0, inf]. This is used with log(z) to compute further terms. The
trigonomic identities can result in z below 0. This creates a NaN. Later
computations can also multiply infinity by 0 or compute inf - inf, which also
create NaN. Thus the algorithm should be checked. After some effort to add
checks to the CMS formula I found it was easier to use the Weron formula for
edge cases. This uses multiplication of terms and can avoid a potential 0 / 0
by restriction of the uniform deviate to (0, 1), i.e. not [0, 1). Thus
infinities can be handled by boxing the infinity to the maximum double value,
computing the expression and then unboxing the result by multiplying by the
maximum double value. This will recreate the infinity from a large value, and
handle multiplication of infinite values by 0.

The CMS and Weron implementations create the same output sample (within
relative error) when used with the same uniform deviate and exponential
deviate, and the deviates are not extreme. This demonstrates they are
rearrangements.

These edge cases of NaN output from the CMS formula are extremely rare unless
the alpha term approaches 0. The following shows a table of the number of times
the term d (computed from z) is infinite:
{noformat}
Beta
Alpha 1.0 0.5 0.25 0.1 0.0
0.05 0.0 0.0 0.0 0.0 0.0
0.02 6.752E-7 6.836E-7 6.761E-7 6.612E-7 7.404E-7
0.01 0.0008493 0.0008533 0.0008541 0.0008546 0.0008539
0.005 0.02889 0.02896 0.02897 0.02897 0.02897
0.001 0.3901 0.3903 0.3903 0.3903 0.3903
{noformat}
When beta=+/-1 the term must also be restricted to the expected domain of the
distribution. The following is the number of times the term was at or outside
the domain when using Math.tan:
{noformat}
Beta
Alpha 1.0 0.5 0.25 0.1 0.0
0.05 0.0003456 0.0 0.0 0.0 0.0
0.02 0.009033 0.0 0.0 0.0 0.0
0.01 0.004025 0.0 0.0 0.0 0.0
0.005 0.1230 0.0 0.0 0.0 0.0
0.001 0.2137 0.0 0.0 0.0 0.0
{noformat}
Or when using the tan2 approximation:
{noformat}
Beta
Alpha 1.0 0.5 0.25 0.1 0.0
0.05 0.001116 0.0 0.0 0.0 0.0
0.02 0.009026 0.0 0.0 0.0 0.0
0.01 0.1650 0.0 0.0 0.0 0.0
0.005 0.2520 0.0 0.0 0.0 0.0
0.001 0.3091 0.0 0.0 0.0 0.0
{noformat}
This shows the use of the tan2 function has more numerical cancellation issues
than using Math.tan. This only applies when beta=+/-1 and alpha is small.
h2. Sampler output

For reference the CDF can be computed using a million samples. The following
shows plot for different alpha and beta parameters.

As alpha -> 0 the distribution is extremely long tailed.

!alpha1.3.jpg!

!alpha0.5.jpg!

!beta0.3.jpg!
h2. Performance

All 3 implementations use a uniform deviate in (0, 1), i.e. 0 must be avoided,
and an exponential deviate. A baseline sampler for this computes both deviates
and multiplies them together. The time for this baseline has been subtracted
from the timings for the 3 implementations. Thus the timings reflect the calls
to Math trigonomic, power, log and exponential functions used by the samplers.

Note that when alpha=0.5 the Weron sampler has a speed increase. The formula
uses two power functions with the exponent as 1/alpha and (1-alpha)/alpha.
These reduce to 2 and 1 respectively when alpha=0.5 thus the power functions
are much faster for this case.

General Case (beta = 0.1, 0.4, 0.7, x-axis = alpha)

!GeneralPerformance.jpg!

Alpha = 1 (x-axis = beta)

!Alpha1Performance.jpg!

Beta = 0 (x-axis = alpha)

!Beta0Performance.jpg!
h2. A robust implementation

The CMS algorithm often will compute NaNs for small alpha and also can do this
for extreme random deviates when the beta term is 1 (these cases are probably
around 1 in 10^15). Use of the Weron formula is simpler to correct. Note that
when alpha is small the term zeta to convert the Weron formula to the
0-parameterisation is very small and will not effect the number of bits of
significance in the sample. Thus a hybrid sampler is a compromise. This can use
the CMS algorithm to create a sample. This will output samples in the
0-parameterisation which is continuous for alpha and beta. It is checked
against the bounds of the distribution support and if not within the expected
support the Weron formula is used to correct NaN or check infinite results.
When alpha is small then the Weron formula can be used directly as the zeta
shift has a small magnitude and will not effect output precision. A value of
alpha < 0.02 is a level where approximately 1 in million samples will have an
infinite term in the computation.
h2. Approximate tan( x) / x

The sampler using tan2 (tan( x) / x) is faster by about 20-30%. I tested the
function with x in the range [0, pi/4] and the mean ULP is <0.02 from using
Math.tan. This is a high level of precision. Note the same test using the
original tan2 function from Chambers et al has a mean ULP of 148293.94. The
approximations are 4283 and 4288 from Hart. This states the maximum relative
error (verses tan) is 1e-10.66 and 1e-26.68 respectively.

Since this is sampling and not a high accuracy computation of the PDF at a
given point then use of the approximation seems justified by the faster
sampling speed. Any output will be checked against the support and the
correction of invalid samples using the Weron formula will use the JDK Math
trigonomic functions.

Note:

I created these implementation using the AhrensDieterExponentialSampler. Many
of the unit tests manipulate the RNG to create extreme output from the
exponential deviate. These should be updated to use the
ZigguratExponentialSampler. I will do this then open a PR for review with the
implementations.

> Move class from "Commons Math"
> ------------------------------
>
> Key: RNG-137
> URL: https://issues.apache.org/jira/browse/RNG-137
> Project: Commons RNG
> Issue Type: Task
> Components: sampling
> Reporter: Gilles Sadowski
> Assignee: Alex Herbert
> Priority: Trivial
> Labels: commons-math, port
> Fix For: 1.4
>
> Attachments: Alpha1Performance.jpg, Beta0Performance.jpg,
> GeneralPerformance.jpg, alpha0.5.jpg, alpha1.3.jpg, beta0.3.jpg
>
>
> Shouldn't CM's class
> [{{StableRandomGenerator}}|https://gitbox.apache.org/repos/asf?p=commons-math.git;a=blob;f=src/main/java/org/apache/commons/math4/random/StableRandomGenerator.java;h=f3a851b9b16c47546cac8371e272dbf59a909c09;hb=HEAD]
> be moved to the \{{commons-rng-sampling}} module?

--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (RNG-137) Move class from "Commons Math"

Reply via email to