[
https://issues.apache.org/jira/browse/RNG-152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17378213#comment-17378213
]
Alex Herbert commented on RNG-152:
----------------------------------
I created a new test that generates N samples from the Gaussian sampler
sequentially.
I tried a few JDKs on Mac OS X and also on Linux. The current latest LTS for
JDK 11 is 11.0.11. On this version the original ziggurat Gaussian sampler is
faster. I also had a previous version of JDK 11 (JDK 11.0.5 on Mac OS X; 11.0.6
on Linux) and JDK 1.8.0_241. On these old JDKs the new modified ziggurat
Gaussian sampler is faster. The speed of the modified version does not really
change. The original Gaussian sampler is about 2x slower on older JDKs.
Here is some output of the new benchmark on linux. Results are similar on Mac
OS X:
{noformat}
Java version: 1.8.0_241, vendor: Oracle Corporation, runtime:
/usr/lib/jvm/jdk1.8.0_241/jre
OS name: "linux", version: "4.15.0-147-generic", arch: "amd64", family: "unix"
Benchmark (randomSourceName) (size)
(type) Mode Cnt Score Error Units
ZigguratSamplerPerformance.sequentialSample XO_RO_SHI_RO_128_PP 1
Gaussian128 avgt 5 13.866 ± 0.035 ns/op
ZigguratSamplerPerformance.sequentialSample XO_RO_SHI_RO_128_PP 1
Gaussian256 avgt 5 12.932 ± 0.058 ns/op
ZigguratSamplerPerformance.sequentialSample XO_RO_SHI_RO_128_PP 1
ModGaussian avgt 5 8.498 ± 0.050 ns/op
ZigguratSamplerPerformance.sequentialSample XO_RO_SHI_RO_128_PP 2
Gaussian128 avgt 5 24.211 ± 0.016 ns/op
ZigguratSamplerPerformance.sequentialSample XO_RO_SHI_RO_128_PP 2
Gaussian256 avgt 5 22.512 ± 0.086 ns/op
ZigguratSamplerPerformance.sequentialSample XO_RO_SHI_RO_128_PP 2
ModGaussian avgt 5 14.555 ± 0.217 ns/op
ZigguratSamplerPerformance.sequentialSample XO_RO_SHI_RO_128_PP 3
Gaussian128 avgt 5 32.986 ± 0.036 ns/op
ZigguratSamplerPerformance.sequentialSample XO_RO_SHI_RO_128_PP 3
Gaussian256 avgt 5 30.725 ± 0.054 ns/op
ZigguratSamplerPerformance.sequentialSample XO_RO_SHI_RO_128_PP 3
ModGaussian avgt 5 19.185 ± 0.439 ns/op
ZigguratSamplerPerformance.sequentialSample XO_RO_SHI_RO_128_PP 4
Gaussian128 avgt 5 42.032 ± 1.261 ns/op
ZigguratSamplerPerformance.sequentialSample XO_RO_SHI_RO_128_PP 4
Gaussian256 avgt 5 38.941 ± 0.026 ns/op
ZigguratSamplerPerformance.sequentialSample XO_RO_SHI_RO_128_PP 4
ModGaussian avgt 5 24.147 ± 1.102 ns/op
ZigguratSamplerPerformance.sequentialSample XO_RO_SHI_RO_128_PP 5
Gaussian128 avgt 5 42.843 ± 0.826 ns/op
ZigguratSamplerPerformance.sequentialSample XO_RO_SHI_RO_128_PP 5
Gaussian256 avgt 5 39.292 ± 1.193 ns/op
ZigguratSamplerPerformance.sequentialSample XO_RO_SHI_RO_128_PP 5
ModGaussian avgt 5 27.327 ± 0.031 ns/op
ZigguratSamplerPerformance.sequentialSample XO_RO_SHI_RO_128_PP 10
Gaussian128 avgt 5 105.515 ± 0.425 ns/op
ZigguratSamplerPerformance.sequentialSample XO_RO_SHI_RO_128_PP 10
Gaussian256 avgt 5 98.137 ± 6.494 ns/op
ZigguratSamplerPerformance.sequentialSample XO_RO_SHI_RO_128_PP 10
ModGaussian avgt 5 59.367 ± 3.230 ns/op
ZigguratSamplerPerformance.sequentialSample XO_RO_SHI_RO_128_PP 20
Gaussian128 avgt 5 199.948 ± 0.653 ns/op
ZigguratSamplerPerformance.sequentialSample XO_RO_SHI_RO_128_PP 20
Gaussian256 avgt 5 184.717 ± 0.614 ns/op
ZigguratSamplerPerformance.sequentialSample XO_RO_SHI_RO_128_PP 20
ModGaussian avgt 5 117.624 ± 5.187 ns/op
ZigguratSamplerPerformance.sequentialSample XO_RO_SHI_RO_128_PP 40
Gaussian128 avgt 5 386.705 ± 0.741 ns/op
ZigguratSamplerPerformance.sequentialSample XO_RO_SHI_RO_128_PP 40
Gaussian256 avgt 5 359.468 ± 0.379 ns/op
ZigguratSamplerPerformance.sequentialSample XO_RO_SHI_RO_128_PP 40
ModGaussian avgt 5 230.842 ± 0.893 ns/op
Java version: 11.0.6, vendor: Oracle Corporation, runtime:
/usr/lib/jvm/jdk-11.0.6
OS name: "linux", version: "4.15.0-147-generic", arch: "amd64", family: "unix"
Benchmark (randomSourceName) (size)
(type) Mode Cnt Score Error Units
ZigguratSamplerPerformance.sequentialSample XO_RO_SHI_RO_128_PP 1
Gaussian128 avgt 5 14.431 ± 0.377 ns/op
ZigguratSamplerPerformance.sequentialSample XO_RO_SHI_RO_128_PP 1
Gaussian256 avgt 5 13.187 ± 0.009 ns/op
ZigguratSamplerPerformance.sequentialSample XO_RO_SHI_RO_128_PP 1
ModGaussian avgt 5 8.502 ± 0.003 ns/op
ZigguratSamplerPerformance.sequentialSample XO_RO_SHI_RO_128_PP 2
Gaussian128 avgt 5 22.739 ± 0.047 ns/op
ZigguratSamplerPerformance.sequentialSample XO_RO_SHI_RO_128_PP 2
Gaussian256 avgt 5 22.181 ± 0.027 ns/op
ZigguratSamplerPerformance.sequentialSample XO_RO_SHI_RO_128_PP 2
ModGaussian avgt 5 13.638 ± 0.019 ns/op
ZigguratSamplerPerformance.sequentialSample XO_RO_SHI_RO_128_PP 3
Gaussian128 avgt 5 30.951 ± 0.031 ns/op
ZigguratSamplerPerformance.sequentialSample XO_RO_SHI_RO_128_PP 3
Gaussian256 avgt 5 29.363 ± 0.636 ns/op
ZigguratSamplerPerformance.sequentialSample XO_RO_SHI_RO_128_PP 3
ModGaussian avgt 5 18.271 ± 0.154 ns/op
ZigguratSamplerPerformance.sequentialSample XO_RO_SHI_RO_128_PP 4
Gaussian128 avgt 5 39.263 ± 0.133 ns/op
ZigguratSamplerPerformance.sequentialSample XO_RO_SHI_RO_128_PP 4
Gaussian256 avgt 5 37.688 ± 0.048 ns/op
ZigguratSamplerPerformance.sequentialSample XO_RO_SHI_RO_128_PP 4
ModGaussian avgt 5 23.247 ± 0.504 ns/op
ZigguratSamplerPerformance.sequentialSample XO_RO_SHI_RO_128_PP 5
Gaussian128 avgt 5 44.450 ± 0.027 ns/op
ZigguratSamplerPerformance.sequentialSample XO_RO_SHI_RO_128_PP 5
Gaussian256 avgt 5 37.076 ± 0.028 ns/op
ZigguratSamplerPerformance.sequentialSample XO_RO_SHI_RO_128_PP 5
ModGaussian avgt 5 23.489 ± 0.104 ns/op
ZigguratSamplerPerformance.sequentialSample XO_RO_SHI_RO_128_PP 10
Gaussian128 avgt 5 96.015 ± 0.143 ns/op
ZigguratSamplerPerformance.sequentialSample XO_RO_SHI_RO_128_PP 10
Gaussian256 avgt 5 95.475 ± 0.039 ns/op
ZigguratSamplerPerformance.sequentialSample XO_RO_SHI_RO_128_PP 10
ModGaussian avgt 5 58.090 ± 0.486 ns/op
ZigguratSamplerPerformance.sequentialSample XO_RO_SHI_RO_128_PP 20
Gaussian128 avgt 5 188.257 ± 0.650 ns/op
ZigguratSamplerPerformance.sequentialSample XO_RO_SHI_RO_128_PP 20
Gaussian256 avgt 5 176.258 ± 0.099 ns/op
ZigguratSamplerPerformance.sequentialSample XO_RO_SHI_RO_128_PP 20
ModGaussian avgt 5 109.538 ± 0.247 ns/op
ZigguratSamplerPerformance.sequentialSample XO_RO_SHI_RO_128_PP 40
Gaussian128 avgt 5 437.148 ± 1.800 ns/op
ZigguratSamplerPerformance.sequentialSample XO_RO_SHI_RO_128_PP 40
Gaussian256 avgt 5 338.579 ± 0.271 ns/op
ZigguratSamplerPerformance.sequentialSample XO_RO_SHI_RO_128_PP 40
ModGaussian avgt 5 227.296 ± 0.734 ns/op
Java version: 11.0.11, vendor: Ubuntu, runtime:
/usr/lib/jvm/java-11-openjdk-amd64
OS name: "linux", version: "4.15.0-147-generic", arch: "amd64", family: "unix"
Benchmark (randomSourceName) (size)
(type) Mode Cnt Score Error Units
ZigguratSamplerPerformance.sequentialSample XO_RO_SHI_RO_128_PP 1
Gaussian128 avgt 5 8.783 ± 0.236 ns/op
ZigguratSamplerPerformance.sequentialSample XO_RO_SHI_RO_128_PP 1
Gaussian256 avgt 5 8.305 ± 0.025 ns/op
ZigguratSamplerPerformance.sequentialSample XO_RO_SHI_RO_128_PP 1
ModGaussian avgt 5 8.572 ± 0.021 ns/op
ZigguratSamplerPerformance.sequentialSample XO_RO_SHI_RO_128_PP 2
Gaussian128 avgt 5 14.572 ± 0.078 ns/op
ZigguratSamplerPerformance.sequentialSample XO_RO_SHI_RO_128_PP 2
Gaussian256 avgt 5 13.777 ± 0.135 ns/op
ZigguratSamplerPerformance.sequentialSample XO_RO_SHI_RO_128_PP 2
ModGaussian avgt 5 13.667 ± 0.014 ns/op
ZigguratSamplerPerformance.sequentialSample XO_RO_SHI_RO_128_PP 3
Gaussian128 avgt 5 18.586 ± 0.145 ns/op
ZigguratSamplerPerformance.sequentialSample XO_RO_SHI_RO_128_PP 3
Gaussian256 avgt 5 17.868 ± 0.699 ns/op
ZigguratSamplerPerformance.sequentialSample XO_RO_SHI_RO_128_PP 3
ModGaussian avgt 5 17.883 ± 0.112 ns/op
ZigguratSamplerPerformance.sequentialSample XO_RO_SHI_RO_128_PP 4
Gaussian128 avgt 5 23.489 ± 0.117 ns/op
ZigguratSamplerPerformance.sequentialSample XO_RO_SHI_RO_128_PP 4
Gaussian256 avgt 5 21.443 ± 0.022 ns/op
ZigguratSamplerPerformance.sequentialSample XO_RO_SHI_RO_128_PP 4
ModGaussian avgt 5 22.833 ± 0.382 ns/op
ZigguratSamplerPerformance.sequentialSample XO_RO_SHI_RO_128_PP 5
Gaussian128 avgt 5 23.186 ± 0.008 ns/op
ZigguratSamplerPerformance.sequentialSample XO_RO_SHI_RO_128_PP 5
Gaussian256 avgt 5 21.364 ± 0.006 ns/op
ZigguratSamplerPerformance.sequentialSample XO_RO_SHI_RO_128_PP 5
ModGaussian avgt 5 23.239 ± 0.014 ns/op
ZigguratSamplerPerformance.sequentialSample XO_RO_SHI_RO_128_PP 10
Gaussian128 avgt 5 50.895 ± 0.198 ns/op
ZigguratSamplerPerformance.sequentialSample XO_RO_SHI_RO_128_PP 10
Gaussian256 avgt 5 46.185 ± 0.017 ns/op
ZigguratSamplerPerformance.sequentialSample XO_RO_SHI_RO_128_PP 10
ModGaussian avgt 5 58.739 ± 0.137 ns/op
ZigguratSamplerPerformance.sequentialSample XO_RO_SHI_RO_128_PP 20
Gaussian128 avgt 5 93.141 ± 0.232 ns/op
ZigguratSamplerPerformance.sequentialSample XO_RO_SHI_RO_128_PP 20
Gaussian256 avgt 5 87.148 ± 0.056 ns/op
ZigguratSamplerPerformance.sequentialSample XO_RO_SHI_RO_128_PP 20
ModGaussian avgt 5 110.319 ± 0.074 ns/op
ZigguratSamplerPerformance.sequentialSample XO_RO_SHI_RO_128_PP 40
Gaussian128 avgt 5 193.606 ± 0.449 ns/op
ZigguratSamplerPerformance.sequentialSample XO_RO_SHI_RO_128_PP 40
Gaussian256 avgt 5 165.470 ± 0.089 ns/op
ZigguratSamplerPerformance.sequentialSample XO_RO_SHI_RO_128_PP 40
ModGaussian avgt 5 223.128 ± 0.530 ns/op
{noformat}
Note that as the size increases the performance of the Gaussian256 method is
much better than the Gaussian128. So an increase in the table size for the
ZigguratNormalizedGaussianSampler would be of benefit for all JDKs.
I do not know what has happened between JDK 11.0.6 and JDK 11.0.11 that makes
the original ziggurat sampler faster. I looked through the [JDK 11 release
notes|https://www.oracle.com/java/technologies/javase/11all-relnotes.html] and
bug fixes for 11.0.7 onwards and could not see anything that could explain
this. It makes a case for running the latest Java version when possible.
On JDK 11.0.11 a sample size of 1 would put the speed order as Gaussian128,
ModGaussian, Gaussian256. So the new modified Gaussian sampler is better than
the original sampler when used with a table size of 128. A switch to a larger
table size would help the original ziggurat sampler. This does not address
poorer performance of the original sampler on older JDKs.
For consistency of performance across JDKs I would vote for using the modified
Gaussian sampler. If the latest JDK 11 version is to be used (or a later JDK)
then it is recommended to obtain the RNG code and run the JMH performance test
in the examples module to verify which sampler is fastest for individual
applications.
In practical application this will only effect sampling where repeat
invocations of the same Gaussian sampler are required with no other code
overhead, such as unit vector generation. If Gaussian samples are required ad
hoc in between executing other code then the performance may not be different.
This is the case for samplers such as the LargeMeanPoissonSampler or the
MarsagliaTsangGammaSampler.
> Update sampling to use ZigguratSampler.NormalizedGaussian
> ---------------------------------------------------------
>
> Key: RNG-152
> URL: https://issues.apache.org/jira/browse/RNG-152
> Project: Commons RNG
> Issue Type: Improvement
> Components: sampling
> Reporter: Alex Herbert
> Priority: Minor
>
> The new ZigguratSampler.NormalizedGaussian has better performance than the
> current ZigguratNormalizedGaussianSampler on JDK 8 and no worse performance
> on later JDK platforms.
> Current samplers using a Gaussian distribution should update to the new
> ZigguratSampler.NormalizedGaussian.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)