[
https://issues.apache.org/jira/browse/RNG-152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17377299#comment-17377299
]
Alex Herbert commented on RNG-152:
----------------------------------
Current (selected) classes using the ZigguratNormalizedGaussianSampler:
{noformat}
> git grep --name-only ZigguratNormalizedGaussianSampler
commons-rng-examples/examples-jmh/src/main/java/org/apache/commons/rng/examples/jmh/sampling/UnitSphereSamplerBenchmark.java
commons-rng-examples/examples-jmh/src/main/java/org/apache/commons/rng/examples/jmh/sampling/distribution/ZigguratSamplerPerformance.java
commons-rng-examples/examples-jmh/src/main/java/org/apache/commons/rng/examples/jmh/sampling/shape/UnitBallSamplerBenchmark.java
commons-rng-examples/examples-jpms/jpms-lib/src/main/java/org/apache/commons/rng/examples/jpms/lib/DiceGame.java
commons-rng-examples/examples-sampling/src/main/java/org/apache/commons/rng/examples/sampling/ProbabilityDensityApproximationCommand.java
commons-rng-examples/examples-sampling/src/main/java/org/apache/commons/rng/examples/sampling/UniformSamplingVisualCheckCommand.java
commons-rng-sampling/src/main/java/org/apache/commons/rng/sampling/UnitSphereSampler.java
commons-rng-sampling/src/main/java/org/apache/commons/rng/sampling/distribution/AhrensDieterMarsagliaTsangGammaSampler.java
commons-rng-sampling/src/main/java/org/apache/commons/rng/sampling/distribution/LargeMeanPoissonSampler.java
commons-rng-sampling/src/main/java/org/apache/commons/rng/sampling/distribution/LevySampler.java
commons-rng-sampling/src/main/java/org/apache/commons/rng/sampling/shape/UnitBallSampler.java
{noformat}
5 samplers currently rely on normalised Gaussian deviates. There are
performance tests for each. The following shows the output on JDK 11.0.11 where
the ZigguratNormalizedGaussianSampler and ZigguratSampler.NormalizedGaussian
are known to have very similar performance (the later is slightly faster).
There is a summary table after the raw results.
{noformat}
Benchmark (size) (type) Mode Cnt
Score Error Units
UnitBallSamplerBenchmark.create3D 100 Baseline avgt 5
694.334 ± 29.138 ns/op
UnitBallSamplerBenchmark.create3D 100 BallPoint avgt 5
4367.410 ± 81.028 ns/op
UnitBallSamplerBenchmark.create3D 100 HypersphereInternal avgt 5
7146.131 ± 51.687 ns/op
UnitBallSamplerBenchmark.create3D 100 HypersphereDiscard avgt 5
3333.110 ± 53.220 ns/op
Benchmark (size) (type) Mode Cnt Score
Error Units
UnitSphereSamplerBenchmark.create3D 100 Baseline avgt 5 701.306 ±
44.953 ns/op
UnitSphereSamplerBenchmark.create3D 100 Array avgt 5 3924.162 ±
36.239 ns/op
UnitSphereSamplerBenchmark.create3D 100 NonArray avgt 5 2281.260 ±
39.767 ns/op
Benchmark (randomSourceName)
(samplerType) Mode Cnt Score Error Units
ContinuousSamplersPerformance.baseline N/A
N/A avgt 5 2.506 ± 0.004 ns/op
ContinuousSamplersPerformance.sample XO_RO_SHI_RO_128_PP
LevySampler avgt 5 10.982 ± 0.023 ns/op
ContinuousSamplersPerformance.sample XO_RO_SHI_RO_128_PP
MarsagliaTsangGammaSampler avgt 5 16.359 ± 0.066 ns/op
Benchmark (randomSourceName)
(samplerType) Mode Cnt Score Error Units
DiscreteSamplersPerformance.baseline N/A
N/A avgt 5 2.278 ± 0.014 ns/op
DiscreteSamplersPerformance.sample XO_RO_SHI_RO_128_PP
LargeMeanPoissonSampler avgt 5 57.560 ± 2.722 ns/op
{noformat}
Updated to use the ZigguratSampler.NormalizedGaussian:
{noformat}
Benchmark (size) (type) Mode Cnt
Score Error Units
UnitBallSamplerBenchmark.create3D 100 Baseline avgt 5
695.305 ± 33.421 ns/op
UnitBallSamplerBenchmark.create3D 100 BallPoint avgt 5
5214.528 ± 18.863 ns/op
UnitBallSamplerBenchmark.create3D 100 HypersphereInternal avgt 5
7754.456 ± 62.132 ns/op
UnitBallSamplerBenchmark.create3D 100 HypersphereDiscard avgt 5
3886.132 ± 287.409 ns/op
Benchmark (size) (type) Mode Cnt Score
Error Units
UnitSphereSamplerBenchmark.create3D 100 Baseline avgt 5 692.986 ±
11.626 ns/op
UnitSphereSamplerBenchmark.create3D 100 Array avgt 5 3773.485 ±
71.889 ns/op
UnitSphereSamplerBenchmark.create3D 100 NonArray avgt 5 2789.351 ±
48.237 ns/o
Benchmark (randomSourceName)
(samplerType) Mode Cnt Score Error Units
ContinuousSamplersPerformance.baseline N/A
N/A avgt 5 2.507 ± 0.004 ns/op
ContinuousSamplersPerformance.sample XO_RO_SHI_RO_128_PP
LevySampler avgt 5 10.679 ± 0.025 ns/op
ContinuousSamplersPerformance.sample XO_RO_SHI_RO_128_PP
MarsagliaTsangGammaSampler avgt 5 15.633 ± 0.046 ns/op
Benchmark (randomSourceName)
(samplerType) Mode Cnt Score Error Units
DiscreteSamplersPerformance.baseline N/A
N/A avgt 5 2.270 ± 0.009 ns/op
DiscreteSamplersPerformance.sample XO_RO_SHI_RO_128_PP
LargeMeanPoissonSampler avgt 5 55.165 ± 0.143 ns/op
{noformat}
Summary:
|*Benchmark*|*Sampler*|*Old*|*Baseline*|*Adjusted*|*New*|*Baseline*|*Adjusted*|*Relative*|
|UnitBallSamplerBenchmark.create3|BallPoint|4367.41|694.334|3673.076|5214.528|695.305|4519.223|1.230|
|UnitBallSamplerBenchmark.create3|HypersphereInternal|7146.131|694.334|6451.797|7754.456|695.305|7059.151|1.094|
|UnitBallSamplerBenchmark.create3|HypersphereDiscard|3333.11|694.334|2638.776|3886.132|695.305|3190.827|*1.209*|
|UnitSphereSamplerBenchmark.create3D|Array|3924.162|701.306|3222.856|3773.485|692.986|3080.499|0.956|
|UnitSphereSamplerBenchmark.create3D|NonArray|2281.26|701.306|1579.954|2789.351|692.986|2096.365|*1.327*|
|ContinuousSamplersPerformance.sample|LevySampler|10.982|2.506|8.476|10.679|2.507|8.172|*0.964*|
|ContinuousSamplersPerformance.sample|MarsagliaTsangGammaSampler|16.359|2.506|13.853|15.633|2.507|13.126|*0.948*|
|DiscreteSamplersPerformance.sample|LargeMeanPoissonSampler|57.56|2.278|55.282|55.165|2.27|52.895|*0.957*|
So the distribution samplers are marginally faster with the new normalized
gaussian. The Ball and Sphere samplers are slower (the methods in the main code
use the fastest method: UnitBall.HypersphereDiscard and UnitSphere.NonArray,
shown in bold).
When sampling in 3D the Ball sampler HypersphereDiscard method generates 5
Gaussian samples on consecutive lines, i.e. no other code executes. The current
ZigguratNormalizedGaussianSampler has a single table size of 128. The new
ZigguratSampler.NormalizedGaussian has 4 tables of size 256. So some cache
optimisation may be occurring that favours the old sampler for repeat
invocations. This requires more investigation.
Looking at JDK 8 for the Ball and Sphere where the old Gaussian sampler is
known to be slower:
{noformat}
Benchmark (size) (type) Mode Cnt
Score Error Units
UnitBallSamplerBenchmark.create3D 100 Baseline avgt 5
859.930 ± 184.008 ns/op
UnitBallSamplerBenchmark.create3D 100 BallPoint avgt 5
5792.806 ± 496.783 ns/op
UnitBallSamplerBenchmark.create3D 100 HypersphereInternal avgt 5
9885.571 ± 122.684 ns/op
UnitBallSamplerBenchmark.create3D 100 HypersphereDiscard avgt 5
5254.453 ± 469.667 ns/op
Benchmark (size) (type) Mode Cnt Score
Error Units
UnitSphereSamplerBenchmark.create3D 100 Baseline avgt 5 861.789 ±
213.817 ns/op
UnitSphereSamplerBenchmark.create3D 100 Array avgt 5 5086.951 ±
529.301 ns/op
UnitSphereSamplerBenchmark.create3D 100 NonArray avgt 5 3628.031 ±
224.229 ns/op
{noformat}
Updated to use the ZigguratSampler.NormalizedGaussian:
{noformat}
Benchmark (size) (type) Mode Cnt
Score Error Units
UnitBallSamplerBenchmark.create3D 100 Baseline avgt 5
831.179 ± 114.888 ns/op
UnitBallSamplerBenchmark.create3D 100 BallPoint avgt 5
5477.076 ± 317.317 ns/op
UnitBallSamplerBenchmark.create3D 100 HypersphereInternal avgt 5
9784.348 ± 48.112 ns/op
UnitBallSamplerBenchmark.create3D 100 HypersphereDiscard avgt 5
3882.520 ± 43.865 ns/op
Benchmark (size) (type) Mode Cnt Score
Error Units
UnitSphereSamplerBenchmark.create3D 100 Baseline avgt 5 818.970 ±
168.874 ns/op
UnitSphereSamplerBenchmark.create3D 100 Array avgt 5 4284.278 ±
410.146 ns/op
UnitSphereSamplerBenchmark.create3D 100 NonArray avgt 5 2932.109 ±
24.970 ns/op
{noformat}
Summary:
|*Benchmark*|*Sampler*|*Old*|*Baseline*|*Adjusted*|*New*|*Baseline*|*Adjusted*|*Relative*|
|UnitBallSamplerBenchmark.create3|BallPoint|5792.806|859.93|4932.876|5477.076|831.179|4645.897|0.942|
|UnitBallSamplerBenchmark.create3|HypersphereInternal|9885.571|859.93|9025.641|9784.348|831.179|8953.169|0.992|
|UnitBallSamplerBenchmark.create3|HypersphereDiscard|5254.453|859.93|4394.523|3882.52|831.179|3051.341|*0.694*|
|UnitSphereSamplerBenchmark.create3D|Array|5086.951|861.789|4225.162|4284.278|818.97|3465.308|0.820|
|UnitSphereSamplerBenchmark.create3D|NonArray|3628.031|861.789|2766.242|2932.109|818.97|2113.139|*0.764*|
Here the change either makes no difference (UnitBallSampler) or makes the
sampling faster (UnitSphereSampler).
h2. Overall
# The change to the ZigguratSampler.NormalizedGaussian is of marginal
beneficial on JDK 11.0.11 for the distribution samplers.
# The change makes the UnitBall and UnitSphere samplers faster on JDK 8.
# The change makes the UnitBall and UnitSphere samplers slower on JDK 11.0.11.
I plan to do some investigation of point 3. It may be related to cache size and
the tables size of the Gaussian samplers. If so then it may be better to use
the current ZigguratNormalizedGaussianSampler during repeat creation of
Gaussian deviates with no other code, for example filling an array with
Gaussian deviates. A simple benchmark should be able to test this theory.
> Update sampling to use ZigguratSampler.NormalizedGaussian
> ---------------------------------------------------------
>
> Key: RNG-152
> URL: https://issues.apache.org/jira/browse/RNG-152
> Project: Commons RNG
> Issue Type: Improvement
> Components: sampling
> Reporter: Alex Herbert
> Priority: Minor
>
> The new ZigguratSampler.NormalizedGaussian has better performance than the
> current ZigguratNormalizedGaussianSampler on JDK 8 and no worse performance
> on later JDK platforms.
> Current samplers using a Gaussian distribution should update to the new
> ZigguratSampler.NormalizedGaussian.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)