[ 
https://issues.apache.org/jira/browse/RNG-152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17378213#comment-17378213
 ] 

Alex Herbert commented on RNG-152:
----------------------------------

I created a new test that generates N samples from the Gaussian sampler 
sequentially.
I tried a few JDKs on Mac OS X and also on Linux. The current latest LTS for 
JDK 11 is 11.0.11. On this version the original ziggurat Gaussian sampler is 
faster. I also had a previous version of JDK 11 (JDK 11.0.5 on Mac OS X; 11.0.6 
on Linux) and JDK 1.8.0_241. On these old JDKs the new modified ziggurat 
Gaussian sampler is faster. The speed of the modified version does not really 
change. The original Gaussian sampler is about 2x slower on older JDKs.

Here is some output of the new benchmark on linux. Results are similar on Mac 
OS X:

{noformat}
Java version: 1.8.0_241, vendor: Oracle Corporation, runtime: 
/usr/lib/jvm/jdk1.8.0_241/jre
OS name: "linux", version: "4.15.0-147-generic", arch: "amd64", family: "unix"

Benchmark                                     (randomSourceName)  (size)       
(type)  Mode  Cnt    Score   Error  Units
ZigguratSamplerPerformance.sequentialSample  XO_RO_SHI_RO_128_PP       1  
Gaussian128  avgt    5   13.866 ± 0.035  ns/op
ZigguratSamplerPerformance.sequentialSample  XO_RO_SHI_RO_128_PP       1  
Gaussian256  avgt    5   12.932 ± 0.058  ns/op
ZigguratSamplerPerformance.sequentialSample  XO_RO_SHI_RO_128_PP       1  
ModGaussian  avgt    5    8.498 ± 0.050  ns/op
ZigguratSamplerPerformance.sequentialSample  XO_RO_SHI_RO_128_PP       2  
Gaussian128  avgt    5   24.211 ± 0.016  ns/op
ZigguratSamplerPerformance.sequentialSample  XO_RO_SHI_RO_128_PP       2  
Gaussian256  avgt    5   22.512 ± 0.086  ns/op
ZigguratSamplerPerformance.sequentialSample  XO_RO_SHI_RO_128_PP       2  
ModGaussian  avgt    5   14.555 ± 0.217  ns/op
ZigguratSamplerPerformance.sequentialSample  XO_RO_SHI_RO_128_PP       3  
Gaussian128  avgt    5   32.986 ± 0.036  ns/op
ZigguratSamplerPerformance.sequentialSample  XO_RO_SHI_RO_128_PP       3  
Gaussian256  avgt    5   30.725 ± 0.054  ns/op
ZigguratSamplerPerformance.sequentialSample  XO_RO_SHI_RO_128_PP       3  
ModGaussian  avgt    5   19.185 ± 0.439  ns/op
ZigguratSamplerPerformance.sequentialSample  XO_RO_SHI_RO_128_PP       4  
Gaussian128  avgt    5   42.032 ± 1.261  ns/op
ZigguratSamplerPerformance.sequentialSample  XO_RO_SHI_RO_128_PP       4  
Gaussian256  avgt    5   38.941 ± 0.026  ns/op
ZigguratSamplerPerformance.sequentialSample  XO_RO_SHI_RO_128_PP       4  
ModGaussian  avgt    5   24.147 ± 1.102  ns/op
ZigguratSamplerPerformance.sequentialSample  XO_RO_SHI_RO_128_PP       5  
Gaussian128  avgt    5   42.843 ± 0.826  ns/op
ZigguratSamplerPerformance.sequentialSample  XO_RO_SHI_RO_128_PP       5  
Gaussian256  avgt    5   39.292 ± 1.193  ns/op
ZigguratSamplerPerformance.sequentialSample  XO_RO_SHI_RO_128_PP       5  
ModGaussian  avgt    5   27.327 ± 0.031  ns/op
ZigguratSamplerPerformance.sequentialSample  XO_RO_SHI_RO_128_PP      10  
Gaussian128  avgt    5  105.515 ± 0.425  ns/op
ZigguratSamplerPerformance.sequentialSample  XO_RO_SHI_RO_128_PP      10  
Gaussian256  avgt    5   98.137 ± 6.494  ns/op
ZigguratSamplerPerformance.sequentialSample  XO_RO_SHI_RO_128_PP      10  
ModGaussian  avgt    5   59.367 ± 3.230  ns/op
ZigguratSamplerPerformance.sequentialSample  XO_RO_SHI_RO_128_PP      20  
Gaussian128  avgt    5  199.948 ± 0.653  ns/op
ZigguratSamplerPerformance.sequentialSample  XO_RO_SHI_RO_128_PP      20  
Gaussian256  avgt    5  184.717 ± 0.614  ns/op
ZigguratSamplerPerformance.sequentialSample  XO_RO_SHI_RO_128_PP      20  
ModGaussian  avgt    5  117.624 ± 5.187  ns/op
ZigguratSamplerPerformance.sequentialSample  XO_RO_SHI_RO_128_PP      40  
Gaussian128  avgt    5  386.705 ± 0.741  ns/op
ZigguratSamplerPerformance.sequentialSample  XO_RO_SHI_RO_128_PP      40  
Gaussian256  avgt    5  359.468 ± 0.379  ns/op
ZigguratSamplerPerformance.sequentialSample  XO_RO_SHI_RO_128_PP      40  
ModGaussian  avgt    5  230.842 ± 0.893  ns/op

Java version: 11.0.6, vendor: Oracle Corporation, runtime: 
/usr/lib/jvm/jdk-11.0.6
OS name: "linux", version: "4.15.0-147-generic", arch: "amd64", family: "unix"

Benchmark                                     (randomSourceName)  (size)       
(type)  Mode  Cnt    Score   Error  Units
ZigguratSamplerPerformance.sequentialSample  XO_RO_SHI_RO_128_PP       1  
Gaussian128  avgt    5   14.431 ± 0.377  ns/op
ZigguratSamplerPerformance.sequentialSample  XO_RO_SHI_RO_128_PP       1  
Gaussian256  avgt    5   13.187 ± 0.009  ns/op
ZigguratSamplerPerformance.sequentialSample  XO_RO_SHI_RO_128_PP       1  
ModGaussian  avgt    5    8.502 ± 0.003  ns/op
ZigguratSamplerPerformance.sequentialSample  XO_RO_SHI_RO_128_PP       2  
Gaussian128  avgt    5   22.739 ± 0.047  ns/op
ZigguratSamplerPerformance.sequentialSample  XO_RO_SHI_RO_128_PP       2  
Gaussian256  avgt    5   22.181 ± 0.027  ns/op
ZigguratSamplerPerformance.sequentialSample  XO_RO_SHI_RO_128_PP       2  
ModGaussian  avgt    5   13.638 ± 0.019  ns/op
ZigguratSamplerPerformance.sequentialSample  XO_RO_SHI_RO_128_PP       3  
Gaussian128  avgt    5   30.951 ± 0.031  ns/op
ZigguratSamplerPerformance.sequentialSample  XO_RO_SHI_RO_128_PP       3  
Gaussian256  avgt    5   29.363 ± 0.636  ns/op
ZigguratSamplerPerformance.sequentialSample  XO_RO_SHI_RO_128_PP       3  
ModGaussian  avgt    5   18.271 ± 0.154  ns/op
ZigguratSamplerPerformance.sequentialSample  XO_RO_SHI_RO_128_PP       4  
Gaussian128  avgt    5   39.263 ± 0.133  ns/op
ZigguratSamplerPerformance.sequentialSample  XO_RO_SHI_RO_128_PP       4  
Gaussian256  avgt    5   37.688 ± 0.048  ns/op
ZigguratSamplerPerformance.sequentialSample  XO_RO_SHI_RO_128_PP       4  
ModGaussian  avgt    5   23.247 ± 0.504  ns/op
ZigguratSamplerPerformance.sequentialSample  XO_RO_SHI_RO_128_PP       5  
Gaussian128  avgt    5   44.450 ± 0.027  ns/op
ZigguratSamplerPerformance.sequentialSample  XO_RO_SHI_RO_128_PP       5  
Gaussian256  avgt    5   37.076 ± 0.028  ns/op
ZigguratSamplerPerformance.sequentialSample  XO_RO_SHI_RO_128_PP       5  
ModGaussian  avgt    5   23.489 ± 0.104  ns/op
ZigguratSamplerPerformance.sequentialSample  XO_RO_SHI_RO_128_PP      10  
Gaussian128  avgt    5   96.015 ± 0.143  ns/op
ZigguratSamplerPerformance.sequentialSample  XO_RO_SHI_RO_128_PP      10  
Gaussian256  avgt    5   95.475 ± 0.039  ns/op
ZigguratSamplerPerformance.sequentialSample  XO_RO_SHI_RO_128_PP      10  
ModGaussian  avgt    5   58.090 ± 0.486  ns/op
ZigguratSamplerPerformance.sequentialSample  XO_RO_SHI_RO_128_PP      20  
Gaussian128  avgt    5  188.257 ± 0.650  ns/op
ZigguratSamplerPerformance.sequentialSample  XO_RO_SHI_RO_128_PP      20  
Gaussian256  avgt    5  176.258 ± 0.099  ns/op
ZigguratSamplerPerformance.sequentialSample  XO_RO_SHI_RO_128_PP      20  
ModGaussian  avgt    5  109.538 ± 0.247  ns/op
ZigguratSamplerPerformance.sequentialSample  XO_RO_SHI_RO_128_PP      40  
Gaussian128  avgt    5  437.148 ± 1.800  ns/op
ZigguratSamplerPerformance.sequentialSample  XO_RO_SHI_RO_128_PP      40  
Gaussian256  avgt    5  338.579 ± 0.271  ns/op
ZigguratSamplerPerformance.sequentialSample  XO_RO_SHI_RO_128_PP      40  
ModGaussian  avgt    5  227.296 ± 0.734  ns/op

Java version: 11.0.11, vendor: Ubuntu, runtime: 
/usr/lib/jvm/java-11-openjdk-amd64
OS name: "linux", version: "4.15.0-147-generic", arch: "amd64", family: "unix"

Benchmark                                     (randomSourceName)  (size)       
(type)  Mode  Cnt    Score   Error  Units
ZigguratSamplerPerformance.sequentialSample  XO_RO_SHI_RO_128_PP       1  
Gaussian128  avgt    5    8.783 ± 0.236  ns/op
ZigguratSamplerPerformance.sequentialSample  XO_RO_SHI_RO_128_PP       1  
Gaussian256  avgt    5    8.305 ± 0.025  ns/op
ZigguratSamplerPerformance.sequentialSample  XO_RO_SHI_RO_128_PP       1  
ModGaussian  avgt    5    8.572 ± 0.021  ns/op
ZigguratSamplerPerformance.sequentialSample  XO_RO_SHI_RO_128_PP       2  
Gaussian128  avgt    5   14.572 ± 0.078  ns/op
ZigguratSamplerPerformance.sequentialSample  XO_RO_SHI_RO_128_PP       2  
Gaussian256  avgt    5   13.777 ± 0.135  ns/op
ZigguratSamplerPerformance.sequentialSample  XO_RO_SHI_RO_128_PP       2  
ModGaussian  avgt    5   13.667 ± 0.014  ns/op
ZigguratSamplerPerformance.sequentialSample  XO_RO_SHI_RO_128_PP       3  
Gaussian128  avgt    5   18.586 ± 0.145  ns/op
ZigguratSamplerPerformance.sequentialSample  XO_RO_SHI_RO_128_PP       3  
Gaussian256  avgt    5   17.868 ± 0.699  ns/op
ZigguratSamplerPerformance.sequentialSample  XO_RO_SHI_RO_128_PP       3  
ModGaussian  avgt    5   17.883 ± 0.112  ns/op
ZigguratSamplerPerformance.sequentialSample  XO_RO_SHI_RO_128_PP       4  
Gaussian128  avgt    5   23.489 ± 0.117  ns/op
ZigguratSamplerPerformance.sequentialSample  XO_RO_SHI_RO_128_PP       4  
Gaussian256  avgt    5   21.443 ± 0.022  ns/op
ZigguratSamplerPerformance.sequentialSample  XO_RO_SHI_RO_128_PP       4  
ModGaussian  avgt    5   22.833 ± 0.382  ns/op
ZigguratSamplerPerformance.sequentialSample  XO_RO_SHI_RO_128_PP       5  
Gaussian128  avgt    5   23.186 ± 0.008  ns/op
ZigguratSamplerPerformance.sequentialSample  XO_RO_SHI_RO_128_PP       5  
Gaussian256  avgt    5   21.364 ± 0.006  ns/op
ZigguratSamplerPerformance.sequentialSample  XO_RO_SHI_RO_128_PP       5  
ModGaussian  avgt    5   23.239 ± 0.014  ns/op
ZigguratSamplerPerformance.sequentialSample  XO_RO_SHI_RO_128_PP      10  
Gaussian128  avgt    5   50.895 ± 0.198  ns/op
ZigguratSamplerPerformance.sequentialSample  XO_RO_SHI_RO_128_PP      10  
Gaussian256  avgt    5   46.185 ± 0.017  ns/op
ZigguratSamplerPerformance.sequentialSample  XO_RO_SHI_RO_128_PP      10  
ModGaussian  avgt    5   58.739 ± 0.137  ns/op
ZigguratSamplerPerformance.sequentialSample  XO_RO_SHI_RO_128_PP      20  
Gaussian128  avgt    5   93.141 ± 0.232  ns/op
ZigguratSamplerPerformance.sequentialSample  XO_RO_SHI_RO_128_PP      20  
Gaussian256  avgt    5   87.148 ± 0.056  ns/op
ZigguratSamplerPerformance.sequentialSample  XO_RO_SHI_RO_128_PP      20  
ModGaussian  avgt    5  110.319 ± 0.074  ns/op
ZigguratSamplerPerformance.sequentialSample  XO_RO_SHI_RO_128_PP      40  
Gaussian128  avgt    5  193.606 ± 0.449  ns/op
ZigguratSamplerPerformance.sequentialSample  XO_RO_SHI_RO_128_PP      40  
Gaussian256  avgt    5  165.470 ± 0.089  ns/op
ZigguratSamplerPerformance.sequentialSample  XO_RO_SHI_RO_128_PP      40  
ModGaussian  avgt    5  223.128 ± 0.530  ns/op
{noformat}

Note that as the size increases the performance of the Gaussian256 method is 
much better than the Gaussian128. So an increase in the table size for the 
ZigguratNormalizedGaussianSampler would be of benefit for all JDKs.

I do not know what has happened between JDK 11.0.6 and JDK 11.0.11 that makes 
the original ziggurat sampler faster. I looked through the [JDK 11 release 
notes|https://www.oracle.com/java/technologies/javase/11all-relnotes.html] and 
bug fixes for 11.0.7 onwards and could not see anything that could explain 
this. It makes a case for running the latest Java version when possible.

On JDK 11.0.11 a sample size of 1 would put the speed order as Gaussian128, 
ModGaussian, Gaussian256. So the new modified Gaussian sampler is better than 
the original sampler when used with a table size of 128. A switch to a larger 
table size would help the original ziggurat sampler. This does not address 
poorer performance of the original sampler on older JDKs.

For consistency of performance across JDKs I would vote for using the modified 
Gaussian sampler. If the latest JDK 11 version is to be used (or a later JDK) 
then it is recommended to obtain the RNG code and run the JMH performance test 
in the examples module to verify which sampler is fastest for individual 
applications.

In practical application this will only effect sampling where repeat 
invocations of the same Gaussian sampler are required with no other code 
overhead, such as unit vector generation. If Gaussian samples are required ad 
hoc in between executing other code then the performance may not be different. 
This is the case for samplers such as the LargeMeanPoissonSampler or the 
MarsagliaTsangGammaSampler.



> Update sampling to use ZigguratSampler.NormalizedGaussian
> ---------------------------------------------------------
>
>                 Key: RNG-152
>                 URL: https://issues.apache.org/jira/browse/RNG-152
>             Project: Commons RNG
>          Issue Type: Improvement
>          Components: sampling
>            Reporter: Alex Herbert
>            Priority: Minor
>
> The new ZigguratSampler.NormalizedGaussian has better performance than the 
> current ZigguratNormalizedGaussianSampler on JDK 8 and no worse performance 
> on later JDK platforms.
> Current samplers using a Gaussian distribution should update to the new 
> ZigguratSampler.NormalizedGaussian.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to