[
https://issues.apache.org/jira/browse/RNG-191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18062145#comment-18062145
]
Alex Herbert commented on RNG-191:
----------------------------------
h2. LXM generators are not slowed by unsigned multiply
Further to the benchmark results above, I added a dedicated benchmark for the
LXM generators using either the current unsigned multiply high or the use of
Math.unsignedMultiplyHigh/multiplyHigh. I tested this on an M2 Pro across
various JVMs. I will not show the results as they are similar to those above.
The use of the Math multiply methods are either the same speed or slower.
The conclusion is that the unsigned multiply is not the performance critical
part of the LXM generators. To test this I removed the call completely and used
a standard 64-bit multiply. This will create a broken generator with the wrong
output. These generators were either the same speed or slower than the original
implementation.
h2. Use within a sampler
It is possible that the nextLong benchmark is optimising the generator so that
the unsigned multiply can be computed in parallel on the processor pipeline. I
tested using the generator in more complex code to generate Gaussian samples:
||Sampler||RNG use||
|BoxMullerNormalizedGaussianSampler|Creates two samples with a call to
nextDouble and nextLong. Most of the time is used to map samples to a Gaussian
using Math calls to sin, cos, sqrt and log. There is no rejection rate.|
|MarsagliaNormalizedGaussianSampler|Creates 2 double samples from a unit box
and maps to a unit circle. Rejection rate is 1 - pi/4 = 21.4%. When inside the
circle a call to Math sqrt and log is used to map to a Gaussian.|
|ZigguratNormalizedGaussianSampler|Uses a look-up table approximately 98.5% of
the time with nextLong. The other branch fixes the sample output using branches
with either nextLong or nextDouble.|
|ZigguratSampler.NormalizedGaussian|Uses a look-up table approximately 98.8% of
the time with nextLong. The other branch fixes the sample output using branches
with nextLong.|
Results on an M2 Max using OpenJDK 1.8.0_472 or 21.0.9. Results in
ns/operation.
||RNG||JDK||Version||BoxMullerNormalizedGaussianSampler||MarsagliaNormalizedGaussianSampler||ZigguratNormalizedGaussianSampler||ZigguratSampler.NormalizedGaussian||
|L128_X1024_MIX|8|Original|167.04|113.28|16.80|7.29|
|L128_X1024_MIX|8|UM|170.36|118.70|16.55|6.79|
|L128_X1024_MIX|21|Original|15.09|13.26|5.14|4.61|
|L128_X1024_MIX|21|UM|14.59|12.12|4.98|4.13|
|L128_X128_MIX|8|Original|165.87|113.71|16.51|5.93|
|L128_X128_MIX|8|UM|167.37|112.62|15.12|6.07|
|L128_X128_MIX|21|Original|17.55|13.13|4.74|4.21|
|L128_X128_MIX|21|UM|15.73|11.01|4.29|3.97|
|L128_X256_MIX|8|Original|167.73|114.23|15.52|6.06|
|L128_X256_MIX|8|UM|170.72|113.64|14.91|6.06|
|L128_X256_MIX|21|Original|17.26|13.03|5.02|4.39|
|L128_X256_MIX|21|UM|15.75|11.12|4.22|4.66|
|PHILOX_4X64|8|Original|182.16|129.03|25.57|19.21|
|PHILOX_4X64|8|UM|183.91|131.07|25.71|19.67|
|PHILOX_4X64|21|Original|25.45|25.65|12.02|11.63|
|PHILOX_4X64|21|UM|20.13|20.62|4.88|5.40|
The BoxMuller and Marsaglia samplers are slow. The speed of the generator is
not critical. Use of unsigned multiply can speed up the Philox results by 20%.
A smaller speed-up is observed for the LXM generators. This is in contrast to
the benchmark of the generator on its own. This is the first consistent result
showing use of unsigned multiply can speed up LXM output.
The lookup table samplers are fast enough that the generator speed is
significant. Here use of unsigned multiply doubles the speed of the Philox
sampler. The LXM generators can get faster but the difference is relatively
small. One result is slower for the ZigguratSampler.NormalizedGaussian using
L128_X256_MIX but this is within noise of the test results. The time for the
original result on JDK 21 is 4.38 +/- 0.51 showing large variation in this
result.
h2. Conclusion
When used within a sampler the LXM family of generator show a small increase in
speed. This is not consistent across all generators. The L128_X256_MIX
generator shows slowdown more often than the other generators tested.
The Philox generator consistently increases performance with Math multiply high
methods.
I would recommend using the Math multiply high methods within the Philox
generator only. Use across the LXM family shows inconsistent results and
further testing is required.
> Use java.lang.invoke to dynamically call Math multiply high methods
> -------------------------------------------------------------------
>
> Key: RNG-191
> URL: https://issues.apache.org/jira/browse/RNG-191
> Project: Commons RNG
> Issue Type: Improvement
> Components: core
> Reporter: Alex Herbert
> Priority: Minor
>
> The following generators rely on a 128-bit multiplication result from two
> 64-bit longs:
> * L128_X128_MIX
> * L128_X256_MIX
> * L128_X1024_MIX
> * PHILOX_4X64
> Java added support to java.lang.Math to compute the upper 64-bit result with
> potential intrinsic calls to native functions:
> ||Method||JDK||Notes||
> |[multiplyHigh (Javadoc JDK
> 21)|https://docs.oracle.com/en/java/javase/21/docs/api/java.base/java/lang/Math.html#multiplyHigh(long,long)]|9|Convert
> to unsigned using:
> Math.multiplyHigh(a, b) + ((a >> 63) & b) + ((b >> 63) & a)|
> |[unsignedMultiplyHigh (Javadoc JDK
> 21)|https://docs.oracle.com/en/java/javase/21/docs/api/java.base/java/lang/Math.html#unsignedMultiplyHigh(long,long)]|18|
> |
> Since Commons RNG targets Java 8 these methods cannot be used. The current
> RNGs use a software implementation to compose the upper bits of the 128-bit
> result. However the methods can be used with a
> [MethodHandle|https://docs.oracle.com/en/java/javase/21/docs/api/java.base/java/lang/invoke/MethodHandle.html]
> introduced in Java 7:
> {code:java}
> // find the method
> MethodHandle h = MethodHandles.publicLookup()
> .findStatic(Math.class,
> "unsignedMultiplyHigh",
> MethodType.methodType(long.class, long.class, long.class));
> // invoke (snippet)
> long a, b;
> try {
> long r = (long) h2.invokeExact(a, b);
> } catch (Throwable e) {
> throw new RuntimeException(e);
> }
> {code}
> Investigate the use of MethodHandle within the named RNGs.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)