[ https://issues.apache.org/jira/browse/RNG-137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17374011#comment-17374011 ]
Alex Herbert commented on RNG-137: ---------------------------------- I have re-implemented the StableSampler due to issues in the commons math version. The version in CM did not have extensive unit tests and when I investigated further I found the sampler was incorrect for several of the parameter combinations (alpha=2, alpha != 1). I could not find the reference provided in the source code for the StableRandomGenerator for the origin of the formulas. The alpha != 1 formula does not match others I found. h2. References Chambers, Mallows & Stuck (1976) "A Method for Simulating Stable Random Variables". Journal of the American Statistical Association. 71 (354): 340–344. Weron (1996) "On the Chambers-Mallows-Stuck method for simulating skewed stable random variables". Statistics & Probability Letters. 28 (2): 165–171. Nolan (2020) "Univariate Stable Distributions: Models for Heavy Tailed Data". Springer Series in Operations Research and Financial Engineering. Springer. Sections 1.7 and 3.3.3. Hart, JF et al (1968) Computer Approximations, New York: John Wiley & Sons, Inc. h2. Introduction The stable distribution is a family of distributions having an alpha (in (0, 2]) and beta (in [-1, 1]) parameter that define the shape (alpha=stability, beta=skew) and then a scale and location parameter. Special cases include the Gaussian (alpha=2, beta ignored), Cauchy (alpha=1, beta=0), and Levy (alpha=0.5, beta=1). The support is in the interval (-inf, inf) unless beta=+/-1 and alpha < 1. Here the support is to beta * tan(2*alpha/pi), and this finite limit is upper or lower depending on the sign of beta. The stable distribution is defined by the characteristic function which includes a complex number component. There are several parameterisations for the stable distribution which have different characteristic functions. These have been formally described in several places but are summarised in Nolan. In Nolan the scale is named gamma, and the location is named delta. The 1-parameterisation is that computed by the CM implementation. This parameterisation is discontinuous at alpha=1. The 0-parameterisation is continuous at alpha=1. A 0-parameterised stable deviate can be shifted to a 1-parameterised deviate by an offset. The magnitude of the offset approaches infinity as alpha -> 1 thus the discontinuity. For numerical work the 0-parameterisation is recommended. A standard deviate x in the 0-parameterisation with scale=1, location=0 can be shifted to any deviate by a linear transform gamma * x + delta. A method for computing a stable random deviate was published by Chambers et al and is known as the Chambers-Mallows-Stuck (CMS) method. It requires generation of a uniform deviate in (-pi/2, pi/2) and an exponential deviate with mean 1. Two formulas were provided for alpha=1 and alpha != 1 and later a formal proof of this was provided by Weron. These formulas output a deviate in the 1-parameterisation. A shift (zeta) can be applied to obtain the 0-parameterisation. Close to alpha=1 this shift approaches 1e16 using the JDK math library to compute tan: {noformat} Zeta Beta Alpha 1.0 0.5 0.25 0.1 0.0 0.001 0.001571 0.0007854 0.0003927 0.0001571 0.0 0.01 0.01571 0.007855 0.003927 0.001571 0.0 0.05 0.07870 0.03935 0.01968 0.007870 0.0 0.1 0.1584 0.07919 0.03960 0.01584 0.0 0.5 1.000 0.5000 0.2500 0.1000 0.0 0.9 6.314 3.157 1.578 0.6314 0.0 0.95 12.71 6.353 3.177 1.271 0.0 0.99 63.66 31.83 15.91 6.366 0.0 0.995 127.3 63.66 31.83 12.73 0.0 0.999 636.6 318.3 159.2 63.66 0.0 0.9995 1273 636.6 318.3 127.3 0.0 0.9999 6366 3183 1592 636.6 0.0 1.0 1.633E+16 8.166E+15 4.083E+15 1.633E+15 0.0 {noformat} Since most of the deviates when alpha=1 are in the range [-5, 5] this large offset to create the deviate reduces the number of significant bits in the sample due to cancellation. However the Chambers paper provided a rearrangement of the formulas using a re-parameterisation and half angle trigonomic identities to create a function that is continuous as alpha -> 1. This outputs a 0-parameterisation deviate. The paper provides FORTRAN code to compute the sample (the function RSTAB). I refer to this as the CMS formula and the alternative as the Weron formula, although both are implementations of the CMS method. The CMS algorithm RSTAB uses a function tan( x) / x and calls it using a value of x in the range (-pi/4, pi/4). This can be approximated using an expansion summation. The approximation is from Hart. The function is named tan2 in the Chambers paper and the RSTAB program. This should have the properties: * For x=0 this returns 1. * For x=pi/4 this returns 4/pi. * For x=pi/4 this multiplied by x returns 1. The original RSTAB program used single precision (32-bit) floats. The approximation when written using 64-bit doubles did not satisfy the requirements. When the requirements fail then extreme values of x create values that result in cancellation in the formula and invalid output (i.e. NaNs). I located the reference by Hart and used the highest precision approximation for tan( x) that was available. This did satisfy the requirements. h2. CMS algorithm I implemented the CMS algorithm using the JDK Math.tan function and a tan2 approximation from Hart. This algorithm avoids many trigonomic calls to sin and cos by using half angle identities that reduce to just 2 calls to tan. I also implemented the variant Weron formulas. One issue is that the CMS algorithm computes a term z that should be in the range [0, inf]. This is used with log(z) to compute further terms. The trigonomic identities can result in z below 0. This creates a NaN. Later computations can also multiply infinity by 0 or compute inf - inf, which also create NaN. Thus the algorithm should be checked. After some effort to add checks to the CMS formula I found it was easier to use the Weron formula for edge cases. This uses multiplication of terms and can avoid a potential 0 / 0 by restriction of the uniform deviate to (0, 1), i.e. not [0, 1). Thus infinities can be handled by boxing the infinity to the maximum double value, computing the expression and then unboxing the result by multiplying by the maximum double value. This will recreate the infinity from a large value, and handle multiplication of infinite values by 0. The CMS and Weron implementations create the same output sample (within relative error) when used with the same uniform deviate and exponential deviate, and the deviates are not extreme. This demonstrates they are rearrangements. These edge cases of NaN output from the CMS formula are extremely rare unless the alpha term approaches 0. The following shows a table of the number of times the term d (computed from z) is infinite: {noformat} Beta Alpha 1.0 0.5 0.25 0.1 0.0 0.05 0.0 0.0 0.0 0.0 0.0 0.02 6.752E-7 6.836E-7 6.761E-7 6.612E-7 7.404E-7 0.01 0.0008493 0.0008533 0.0008541 0.0008546 0.0008539 0.005 0.02889 0.02896 0.02897 0.02897 0.02897 0.001 0.3901 0.3903 0.3903 0.3903 0.3903 {noformat} When beta=+/-1 the term must also be restricted to the expected domain of the distribution. The following is the number of times the term was at or outside the domain when using Math.tan: {noformat} Beta Alpha 1.0 0.5 0.25 0.1 0.0 0.05 0.0003456 0.0 0.0 0.0 0.0 0.02 0.009033 0.0 0.0 0.0 0.0 0.01 0.004025 0.0 0.0 0.0 0.0 0.005 0.1230 0.0 0.0 0.0 0.0 0.001 0.2137 0.0 0.0 0.0 0.0 {noformat} Or when using the tan2 approximation: {noformat} Beta Alpha 1.0 0.5 0.25 0.1 0.0 0.05 0.001116 0.0 0.0 0.0 0.0 0.02 0.009026 0.0 0.0 0.0 0.0 0.01 0.1650 0.0 0.0 0.0 0.0 0.005 0.2520 0.0 0.0 0.0 0.0 0.001 0.3091 0.0 0.0 0.0 0.0 {noformat} This shows the use of the tan2 function has more numerical cancellation issues than using Math.tan. This only applies when beta=+/-1 and alpha is small. h2. Sampler output For reference the CDF can be computed using a million samples. The following shows plot for different alpha and beta parameters. As alpha -> 0 the distribution is extremely long tailed. !alpha1.3.jpg! !alpha0.5.jpg! !beta0.3.jpg! h2. Performance All 3 implementations use a uniform deviate in (0, 1), i.e. 0 must be avoided, and an exponential deviate. A baseline sampler for this computes both deviates and multiplies them together. The time for this baseline has been subtracted from the timings for the 3 implementations. Thus the timings reflect the calls to Math trigonomic, power, log and exponential functions used by the samplers. Note that when alpha=0.5 the Weron sampler has a speed increase. The formula uses two power functions with the exponent as 1/alpha and (1-alpha)/alpha. These reduce to 2 and 1 respectively when alpha=0.5 thus the power functions are much faster for this case. General Case (beta = 0.1, 0.4, 0.7, x-axis = alpha) !GeneralPerformance.jpg! Alpha = 1 (x-axis = beta) !Alpha1Performance.jpg! Beta = 0 (x-axis = alpha) !Beta0Performance.jpg! h2. A robust implementation The CMS algorithm often will compute NaNs for small alpha and also can do this for extreme random deviates when the beta term is 1 (these cases are probably around 1 in 10^15). Use of the Weron formula is simpler to correct. Note that when alpha is small the term zeta to convert the Weron formula to the 0-parameterisation is very small and will not effect the number of bits of significance in the sample. Thus a hybrid sampler is a compromise. This can use the CMS algorithm to create a sample. This will output samples in the 0-parameterisation which is continuous for alpha and beta. It is checked against the bounds of the distribution support and if not within the expected support the Weron formula is used to correct NaN or check infinite results. When alpha is small then the Weron formula can be used directly as the zeta shift has a small magnitude and will not effect output precision. A value of alpha < 0.02 is a level where approximately 1 in million samples will have an infinite term in the computation. h2. Approximate tan( x) / x The sampler using tan2 (tan( x) / x) is faster by about 20-30%. I tested the function with x in the range [0, pi/4] and the mean ULP is <0.02 from using Math.tan. This is a high level of precision. Note the same test using the original tan2 function from Chambers et al has a mean ULP of 148293.94. The approximations are 4283 and 4288 from Hart. This states the maximum relative error (verses tan) is 1e-10.66 and 1e-26.68 respectively. Since this is sampling and not a high accuracy computation of the PDF at a given point then use of the approximation seems justified by the faster sampling speed. Any output will be checked against the support and the correction of invalid samples using the Weron formula will use the JDK Math trigonomic functions. Note: I created these implementation using the AhrensDieterExponentialSampler. Many of the unit tests manipulate the RNG to create extreme output from the exponential deviate. These should be updated to use the ZigguratExponentialSampler. I will do this then open a PR for review with the implementations. > Move class from "Commons Math" > ------------------------------ > > Key: RNG-137 > URL: https://issues.apache.org/jira/browse/RNG-137 > Project: Commons RNG > Issue Type: Task > Components: sampling > Reporter: Gilles Sadowski > Assignee: Alex Herbert > Priority: Trivial > Labels: commons-math, port > Fix For: 1.4 > > Attachments: Alpha1Performance.jpg, Beta0Performance.jpg, > GeneralPerformance.jpg, alpha0.5.jpg, alpha1.3.jpg, beta0.3.jpg > > > Shouldn't CM's class > [{{StableRandomGenerator}}|https://gitbox.apache.org/repos/asf?p=commons-math.git;a=blob;f=src/main/java/org/apache/commons/math4/random/StableRandomGenerator.java;h=f3a851b9b16c47546cac8371e272dbf59a909c09;hb=HEAD] > be moved to the \{{commons-rng-sampling}} module? -- This message was sent by Atlassian Jira (v8.3.4#803005)