[ 
https://issues.apache.org/jira/browse/RNG-152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17377299#comment-17377299
 ] 

Alex Herbert commented on RNG-152:
----------------------------------

Current (selected) classes using the ZigguratNormalizedGaussianSampler:
{noformat}
> git grep --name-only ZigguratNormalizedGaussianSampler
commons-rng-examples/examples-jmh/src/main/java/org/apache/commons/rng/examples/jmh/sampling/UnitSphereSamplerBenchmark.java
commons-rng-examples/examples-jmh/src/main/java/org/apache/commons/rng/examples/jmh/sampling/distribution/ZigguratSamplerPerformance.java
commons-rng-examples/examples-jmh/src/main/java/org/apache/commons/rng/examples/jmh/sampling/shape/UnitBallSamplerBenchmark.java
commons-rng-examples/examples-jpms/jpms-lib/src/main/java/org/apache/commons/rng/examples/jpms/lib/DiceGame.java
commons-rng-examples/examples-sampling/src/main/java/org/apache/commons/rng/examples/sampling/ProbabilityDensityApproximationCommand.java
commons-rng-examples/examples-sampling/src/main/java/org/apache/commons/rng/examples/sampling/UniformSamplingVisualCheckCommand.java

commons-rng-sampling/src/main/java/org/apache/commons/rng/sampling/UnitSphereSampler.java
commons-rng-sampling/src/main/java/org/apache/commons/rng/sampling/distribution/AhrensDieterMarsagliaTsangGammaSampler.java
commons-rng-sampling/src/main/java/org/apache/commons/rng/sampling/distribution/LargeMeanPoissonSampler.java
commons-rng-sampling/src/main/java/org/apache/commons/rng/sampling/distribution/LevySampler.java
commons-rng-sampling/src/main/java/org/apache/commons/rng/sampling/shape/UnitBallSampler.java
{noformat}
5 samplers currently rely on normalised Gaussian deviates. There are 
performance tests for each. The following shows the output on JDK 11.0.11 where 
the ZigguratNormalizedGaussianSampler and ZigguratSampler.NormalizedGaussian 
are known to have very similar performance (the later is slightly faster). 
There is a summary table after the raw results.
{noformat}
Benchmark                          (size)               (type)  Mode  Cnt     
Score    Error  Units
UnitBallSamplerBenchmark.create3D     100             Baseline  avgt    5   
694.334 ± 29.138  ns/op
UnitBallSamplerBenchmark.create3D     100            BallPoint  avgt    5  
4367.410 ± 81.028  ns/op
UnitBallSamplerBenchmark.create3D     100  HypersphereInternal  avgt    5  
7146.131 ± 51.687  ns/op
UnitBallSamplerBenchmark.create3D     100   HypersphereDiscard  avgt    5  
3333.110 ± 53.220  ns/op

Benchmark                            (size)    (type)  Mode  Cnt     Score    
Error  Units
UnitSphereSamplerBenchmark.create3D     100  Baseline  avgt    5   701.306 ± 
44.953  ns/op
UnitSphereSamplerBenchmark.create3D     100     Array  avgt    5  3924.162 ± 
36.239  ns/op
UnitSphereSamplerBenchmark.create3D     100  NonArray  avgt    5  2281.260 ± 
39.767  ns/op

Benchmark                                (randomSourceName)               
(samplerType)  Mode  Cnt   Score   Error  Units
ContinuousSamplersPerformance.baseline                  N/A                     
    N/A  avgt    5   2.506 ± 0.004  ns/op
ContinuousSamplersPerformance.sample    XO_RO_SHI_RO_128_PP                 
LevySampler  avgt    5  10.982 ± 0.023  ns/op
ContinuousSamplersPerformance.sample    XO_RO_SHI_RO_128_PP  
MarsagliaTsangGammaSampler  avgt    5  16.359 ± 0.066  ns/op

Benchmark                              (randomSourceName)            
(samplerType)  Mode  Cnt   Score   Error  Units
DiscreteSamplersPerformance.baseline                  N/A                      
N/A  avgt    5   2.278 ± 0.014  ns/op
DiscreteSamplersPerformance.sample    XO_RO_SHI_RO_128_PP  
LargeMeanPoissonSampler  avgt    5  57.560 ± 2.722  ns/op
{noformat}
Updated to use the ZigguratSampler.NormalizedGaussian:
{noformat}
Benchmark                          (size)               (type)  Mode  Cnt     
Score     Error  Units
UnitBallSamplerBenchmark.create3D     100             Baseline  avgt    5   
695.305 ±  33.421  ns/op
UnitBallSamplerBenchmark.create3D     100            BallPoint  avgt    5  
5214.528 ±  18.863  ns/op
UnitBallSamplerBenchmark.create3D     100  HypersphereInternal  avgt    5  
7754.456 ±  62.132  ns/op
UnitBallSamplerBenchmark.create3D     100   HypersphereDiscard  avgt    5  
3886.132 ± 287.409  ns/op

Benchmark                            (size)    (type)  Mode  Cnt     Score    
Error  Units
UnitSphereSamplerBenchmark.create3D     100  Baseline  avgt    5   692.986 ± 
11.626  ns/op
UnitSphereSamplerBenchmark.create3D     100     Array  avgt    5  3773.485 ± 
71.889  ns/op
UnitSphereSamplerBenchmark.create3D     100  NonArray  avgt    5  2789.351 ± 
48.237  ns/o

Benchmark                                (randomSourceName)               
(samplerType)  Mode  Cnt   Score   Error  Units
ContinuousSamplersPerformance.baseline                  N/A                     
    N/A  avgt    5   2.507 ± 0.004  ns/op
ContinuousSamplersPerformance.sample    XO_RO_SHI_RO_128_PP                 
LevySampler  avgt    5  10.679 ± 0.025  ns/op
ContinuousSamplersPerformance.sample    XO_RO_SHI_RO_128_PP  
MarsagliaTsangGammaSampler  avgt    5  15.633 ± 0.046  ns/op

Benchmark                              (randomSourceName)            
(samplerType)  Mode  Cnt   Score   Error  Units
DiscreteSamplersPerformance.baseline                  N/A                      
N/A  avgt    5   2.270 ± 0.009  ns/op
DiscreteSamplersPerformance.sample    XO_RO_SHI_RO_128_PP  
LargeMeanPoissonSampler  avgt    5  55.165 ± 0.143  ns/op
{noformat}
Summary:
|*Benchmark*|*Sampler*|*Old*|*Baseline*|*Adjusted*|*New*|*Baseline*|*Adjusted*|*Relative*|
|UnitBallSamplerBenchmark.create3|BallPoint|4367.41|694.334|3673.076|5214.528|695.305|4519.223|1.230|
|UnitBallSamplerBenchmark.create3|HypersphereInternal|7146.131|694.334|6451.797|7754.456|695.305|7059.151|1.094|
|UnitBallSamplerBenchmark.create3|HypersphereDiscard|3333.11|694.334|2638.776|3886.132|695.305|3190.827|*1.209*|
|UnitSphereSamplerBenchmark.create3D|Array|3924.162|701.306|3222.856|3773.485|692.986|3080.499|0.956|
|UnitSphereSamplerBenchmark.create3D|NonArray|2281.26|701.306|1579.954|2789.351|692.986|2096.365|*1.327*|
|ContinuousSamplersPerformance.sample|LevySampler|10.982|2.506|8.476|10.679|2.507|8.172|*0.964*|
|ContinuousSamplersPerformance.sample|MarsagliaTsangGammaSampler|16.359|2.506|13.853|15.633|2.507|13.126|*0.948*|
|DiscreteSamplersPerformance.sample|LargeMeanPoissonSampler|57.56|2.278|55.282|55.165|2.27|52.895|*0.957*|

 
 So the distribution samplers are marginally faster with the new normalized 
gaussian. The Ball and Sphere samplers are slower (the methods in the main code 
use the fastest method: UnitBall.HypersphereDiscard and UnitSphere.NonArray, 
shown in bold).

When sampling in 3D the Ball sampler HypersphereDiscard method generates 5 
Gaussian samples on consecutive lines, i.e. no other code executes. The current 
ZigguratNormalizedGaussianSampler has a single table size of 128. The new 
ZigguratSampler.NormalizedGaussian has 4 tables of size 256. So some cache 
optimisation may be occurring that favours the old sampler for repeat 
invocations. This requires more investigation.

Looking at JDK 8 for the Ball and Sphere where the old Gaussian sampler is 
known to be slower:
{noformat}
Benchmark                          (size)               (type)  Mode  Cnt     
Score     Error  Units
UnitBallSamplerBenchmark.create3D     100             Baseline  avgt    5   
859.930 ± 184.008  ns/op
UnitBallSamplerBenchmark.create3D     100            BallPoint  avgt    5  
5792.806 ± 496.783  ns/op
UnitBallSamplerBenchmark.create3D     100  HypersphereInternal  avgt    5  
9885.571 ± 122.684  ns/op
UnitBallSamplerBenchmark.create3D     100   HypersphereDiscard  avgt    5  
5254.453 ± 469.667  ns/op

Benchmark                            (size)    (type)  Mode  Cnt     Score     
Error  Units
UnitSphereSamplerBenchmark.create3D     100  Baseline  avgt    5   861.789 ± 
213.817  ns/op
UnitSphereSamplerBenchmark.create3D     100     Array  avgt    5  5086.951 ± 
529.301  ns/op
UnitSphereSamplerBenchmark.create3D     100  NonArray  avgt    5  3628.031 ± 
224.229  ns/op
{noformat}
Updated to use the ZigguratSampler.NormalizedGaussian:
{noformat}
Benchmark                          (size)               (type)  Mode  Cnt     
Score     Error  Units
UnitBallSamplerBenchmark.create3D     100             Baseline  avgt    5   
831.179 ± 114.888  ns/op
UnitBallSamplerBenchmark.create3D     100            BallPoint  avgt    5  
5477.076 ± 317.317  ns/op
UnitBallSamplerBenchmark.create3D     100  HypersphereInternal  avgt    5  
9784.348 ±  48.112  ns/op
UnitBallSamplerBenchmark.create3D     100   HypersphereDiscard  avgt    5  
3882.520 ±  43.865  ns/op

Benchmark                            (size)    (type)  Mode  Cnt     Score     
Error  Units
UnitSphereSamplerBenchmark.create3D     100  Baseline  avgt    5   818.970 ± 
168.874  ns/op
UnitSphereSamplerBenchmark.create3D     100     Array  avgt    5  4284.278 ± 
410.146  ns/op
UnitSphereSamplerBenchmark.create3D     100  NonArray  avgt    5  2932.109 ±  
24.970  ns/op
{noformat}
Summary:
|*Benchmark*|*Sampler*|*Old*|*Baseline*|*Adjusted*|*New*|*Baseline*|*Adjusted*|*Relative*|
|UnitBallSamplerBenchmark.create3|BallPoint|5792.806|859.93|4932.876|5477.076|831.179|4645.897|0.942|
|UnitBallSamplerBenchmark.create3|HypersphereInternal|9885.571|859.93|9025.641|9784.348|831.179|8953.169|0.992|
|UnitBallSamplerBenchmark.create3|HypersphereDiscard|5254.453|859.93|4394.523|3882.52|831.179|3051.341|*0.694*|
|UnitSphereSamplerBenchmark.create3D|Array|5086.951|861.789|4225.162|4284.278|818.97|3465.308|0.820|
|UnitSphereSamplerBenchmark.create3D|NonArray|3628.031|861.789|2766.242|2932.109|818.97|2113.139|*0.764*|

Here the change either makes no difference (UnitBallSampler) or makes the 
sampling faster (UnitSphereSampler).
h2. Overall
 # The change to the ZigguratSampler.NormalizedGaussian is of marginal 
beneficial on JDK 11.0.11 for the distribution samplers.
 # The change makes the UnitBall and UnitSphere samplers faster on JDK 8.
 # The change makes the UnitBall and UnitSphere samplers slower on JDK 11.0.11.

I plan to do some investigation of point 3. It may be related to cache size and 
the tables size of the Gaussian samplers. If so then it may be better to use 
the current ZigguratNormalizedGaussianSampler during repeat creation of 
Gaussian deviates with no other code, for example filling an array with 
Gaussian deviates. A simple benchmark should be able to test this theory.

 

 

> Update sampling to use ZigguratSampler.NormalizedGaussian
> ---------------------------------------------------------
>
>                 Key: RNG-152
>                 URL: https://issues.apache.org/jira/browse/RNG-152
>             Project: Commons RNG
>          Issue Type: Improvement
>          Components: sampling
>            Reporter: Alex Herbert
>            Priority: Minor
>
> The new ZigguratSampler.NormalizedGaussian has better performance than the 
> current ZigguratNormalizedGaussianSampler on JDK 8 and no worse performance 
> on later JDK platforms.
> Current samplers using a Gaussian distribution should update to the new 
> ZigguratSampler.NormalizedGaussian.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to