[
https://issues.apache.org/jira/browse/RNG-88?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16809735#comment-16809735
]
Alex D Herbert commented on RNG-88:
-----------------------------------
I have created new JMH benchmarks for all the methods defined in
{{UniformRandomProvider}}.
A baseline implementation has been created for each method and scales linearly
with workload:
!baseline.jpg!
(Note: The {{nextBytes()}} method is plotted on the second Y-axis as the
measurement time is very different.)
Each method from {{UniformRandomProvider}} for each random source is then
benchmarked against a baseline implementation. This separates out the
benchmarks into separate classes that share the baseline implementation. E.g.
{{NextFloatGenerationPerformance}} for {{nextFloat}} but
{{NextIntGenerationPerformance}} for {{nextInt()}} and {{nextInt(int)}}.
An example benchmark looks like this:
{code:java}
/**
* Executes benchmark to compare the speed of generation of random numbers from
the
* various source providers for {@link UniformRandomProvider#nextDouble()}.
*/
public class NextDoubleGenerationPerformance extends AbstractBenchmark {
/**
* The benchmark state (retrieve the various "RandomSource"s).
*/
@State(Scope.Benchmark)
public static class Sources extends BaselineSources {
@Override
protected UniformRandomProvider createBaseline() {
return BaselineUtils.getNextDouble();
}
}
/** The value. */
private double value;
/**
* Baseline for a JMH method call with no return value.
*/
@Benchmark
public void baselineVoid() {
// Do nothing, this is a baseline
}
/**
* Baseline for a JMH method call returning a {@code double}.
*
* @return the value
*/
@Benchmark
public double baselineDouble() {
return value;
}
/**
* Exercise the {@link UniformRandomProvider#nextDouble()} method.
*
* @param sources Source of randomness.
* @return the double
*/
@Benchmark
public double nextDouble(Sources sources) {
return sources.getGenerator().nextDouble();
}
}
{code}
The {{BaselineSources}} declares all RNG providers plus a keyword {{BASELINE}}.
When it identifies this then it calls the method to create the baseline.
Otherwise it just creates the RandomSource using the standard factory method.
I found stability on my test machine using 10 iterations for warm-up and
measurement of 1 second each.
{noformat}
Intel(R) Xeon(R) CPU E5-1680 v3 @ 3.20GHz
openjdk version "1.8.0_191"
Linux gc04016493.gdsc.susx.ac.uk 4.4.0-145-generic #171-Ubuntu SMP Tue Mar 26
12:43:40 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
{noformat}
I ran all the benchmarks, subtracted the BASELINE from the runtime metric then
expressed the results relative to the JDK. I found the median of 10 runs to be
more stable than the average as there can be a few outliers that skew the
score. Here are the relative median scores:
!next.png!
This should be compared to [RNG User
Guide|https://commons.apache.org/proper/commons-rng/userguide/rng.html]. Those
results show that the fastest generators are SPLIT_MIX, XOR_SHIFT_1024_S,
TWO_CMRES, MWC_256 and slowest are JDK and WELL_44497x. This chart shows the
same results but the differences are larger.
For reference the results of the SplitMix in the User guide (v1.2) verses the
new results:
||Method||New||Old||
|nextLong|0.0806000708|0.21751|
|nextBytes|0.2179846927| |
|nextDouble|0.1183835585|0.27611|
|nextLongN|0.1803571971| |
|nextInt|0.1618788839|0.41479|
|nextFloat|0.2567586134| |
|nextIntN|0.3662874911| |
|nextBoolean|1.176949893|1.1214|
In the case of the SplitMix the JMH overhead for time measurement for
{{nextLong()}} is 63% of the total time. So by subtracting the baseline the
differences between a fast generator and a slow one are more pronounced.
I also note the strangely fast speed of {{nextBoolean()}} for the JDK
implementation, where it is fastest despite not being a fast generator of
{{int}} values. The implementation for this method is standard for an
{{IntProvider}}. Store a single {{int}} value and then use each consecutive bit
of the 32-bits for a {{boolean}}. So the speed up must be related to the
relative speed of testing the bits from the JDK source verses the others, and
we know from the BigCrush results that the JDK bit source is not good.
I've opened [PR 35|https://github.com/apache/commons-rng/pull/35] with the new
code.
This idea can be applied to other benchmarks currently using a loop to test
performance that do not require input data/state change on each loop iteration.
These are:
* GeometricSamplersPerformance
* SamplersPerformance
* NextGaussianPerformance
In the case of SamplersPerformance it may be better to separate the
DiscreteSampler and ContinuousSampler into two benchmarks that can have a
baseline implementation of the appropriate sampler interface.
I can do this in a new PR, the same PR or directly within master.
The following benchmarks currently exploit the loop used in the benchmark
method and cannot be modified:
* ThreadLocalPerformance
* PoissonSamplerCachePerformance
> Update the GenerationPerformance benchmark
> ------------------------------------------
>
> Key: RNG-88
> URL: https://issues.apache.org/jira/browse/RNG-88
> Project: Commons RNG
> Issue Type: Improvement
> Components: examples
> Affects Versions: 1.3
> Reporter: Alex D Herbert
> Assignee: Alex D Herbert
> Priority: Minor
> Attachments: baseline.jpg, next.png
>
> Time Spent: 10m
> Remaining Estimate: 0h
>
> The current GenerationPerformance benchmark runs all the generators to
> collect timing data. However the act of running the test within JMH takes
> some time (test overhead). This overhead should be measured and subtracted
> from the timing data to create a time attributed only to the method exercised
> in the RNG.
> This can be done by creating a dummy RNG that returns a fixed value. The
> implementation must be done in a manner where the JIT compiler is not able to
> remove the dummy method from the execution path. This is achieved by
> returning state variables from the dummy RNG (not final variables).
> Testing can be done using a variable number of iterations and the run-times
> assessed for linearity. If the run time scale with the number of iterations
> then the JIT compiler has not removed it from execution. The dummy RNG then
> serves as a baseline for comparison of true implementations.
> This idea with examples is shown in
> [RNG-87|https://issues.apache.org/jira/browse/RNG-87] which tested a variant
> of the MWC_256 algorithm.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)