[jira] [Commented] (RNG-88) Update the GenerationPerformance benchmark

Alex D Herbert (JIRA) Thu, 04 Apr 2019 04:06:10 -0700


    [ 
https://issues.apache.org/jira/browse/RNG-88?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16809735#comment-16809735
 ]


Alex D Herbert commented on RNG-88:
-----------------------------------

I have created new JMH benchmarks for all the methods defined in 
{{UniformRandomProvider}}.

A baseline implementation has been created for each method and scales linearly 
with workload:

!baseline.jpg!

(Note: The {{nextBytes()}} method is plotted on the second Y-axis as the 
measurement time is very different.)

Each method from {{UniformRandomProvider}} for each random source is then 
benchmarked against a baseline implementation. This separates out the 
benchmarks into separate classes that share the baseline implementation. E.g. 
{{NextFloatGenerationPerformance}} for {{nextFloat}} but 
{{NextIntGenerationPerformance}} for {{nextInt()}} and {{nextInt(int)}}.

An example benchmark looks like this:
{code:java}
/**
 * Executes benchmark to compare the speed of generation of random numbers from 
the
 * various source providers for {@link UniformRandomProvider#nextDouble()}.
 */
public class NextDoubleGenerationPerformance extends AbstractBenchmark {
    /**
     * The benchmark state (retrieve the various "RandomSource"s).
     */
    @State(Scope.Benchmark)
    public static class Sources extends BaselineSources {
        @Override
        protected UniformRandomProvider createBaseline() {
            return BaselineUtils.getNextDouble();
        }
    }

    /** The value. */
    private double value;

    /**
     * Baseline for a JMH method call with no return value.
     */
    @Benchmark
    public void baselineVoid() {
        // Do nothing, this is a baseline
    }

    /**
     * Baseline for a JMH method call returning a {@code double}.
     *
     * @return the value
     */
    @Benchmark
    public double baselineDouble() {
        return value;
    }

    /**
     * Exercise the {@link UniformRandomProvider#nextDouble()} method.
     *
     * @param sources Source of randomness.
     * @return the double
     */
    @Benchmark
    public double nextDouble(Sources sources) {
        return sources.getGenerator().nextDouble();
    }
}
{code}
The {{BaselineSources}} declares all RNG providers plus a keyword {{BASELINE}}. 
When it identifies this then it calls the method to create the baseline. 
Otherwise it just creates the RandomSource using the standard factory method.

I found stability on my test machine using 10 iterations for warm-up and 
measurement of 1 second each.
{noformat}
Intel(R) Xeon(R) CPU E5-1680 v3 @ 3.20GHz
openjdk version "1.8.0_191"
Linux gc04016493.gdsc.susx.ac.uk 4.4.0-145-generic #171-Ubuntu SMP Tue Mar 26 
12:43:40 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
{noformat}
I ran all the benchmarks, subtracted the BASELINE from the runtime metric then 
expressed the results relative to the JDK. I found the median of 10 runs to be 
more stable than the average as there can be a few outliers that skew the 
score. Here are the relative median scores:

!next.png!

This should be compared to [RNG User 
Guide|https://commons.apache.org/proper/commons-rng/userguide/rng.html]. Those 
results show that the fastest generators are SPLIT_MIX, XOR_SHIFT_1024_S, 
TWO_CMRES, MWC_256 and slowest are JDK and WELL_44497x. This chart shows the 
same results but the differences are larger.

For reference the results of the SplitMix in the User guide (v1.2) verses the 
new results:
||Method||New||Old||
|nextLong|0.0806000708|0.21751|
|nextBytes|0.2179846927| |
|nextDouble|0.1183835585|0.27611|
|nextLongN|0.1803571971| |
|nextInt|0.1618788839|0.41479|
|nextFloat|0.2567586134| |
|nextIntN|0.3662874911| |
|nextBoolean|1.176949893|1.1214|

In the case of the SplitMix the JMH overhead for time measurement for 
{{nextLong()}} is 63% of the total time. So by subtracting the baseline the 
differences between a fast generator and a slow one are more pronounced.

I also note the strangely fast speed of {{nextBoolean()}} for the JDK 
implementation, where it is fastest despite not being a fast generator of 
{{int}} values. The implementation for this method is standard for an 
{{IntProvider}}. Store a single {{int}} value and then use each consecutive bit 
of the 32-bits for a {{boolean}}. So the speed up must be related to the 
relative speed of testing the bits from the JDK source verses the others, and 
we know from the BigCrush results that the JDK bit source is not good.

I've opened [PR 35|https://github.com/apache/commons-rng/pull/35] with the new 
code.

This idea can be applied to other benchmarks currently using a loop to test 
performance that do not require input data/state change on each loop iteration. 
These are:
 * GeometricSamplersPerformance
 * SamplersPerformance
 * NextGaussianPerformance

In the case of SamplersPerformance it may be better to separate the 
DiscreteSampler and ContinuousSampler into two benchmarks that can have a 
baseline implementation of the appropriate sampler interface.

I can do this in a new PR, the same PR or directly within master.

The following benchmarks currently exploit the loop used in the benchmark 
method and cannot be modified:
 * ThreadLocalPerformance
 * PoissonSamplerCachePerformance

> Update the GenerationPerformance benchmark
> ------------------------------------------
>
>                 Key: RNG-88
>                 URL: https://issues.apache.org/jira/browse/RNG-88
>             Project: Commons RNG
>          Issue Type: Improvement
>          Components: examples
>    Affects Versions: 1.3
>            Reporter: Alex D Herbert
>            Assignee: Alex D Herbert
>            Priority: Minor
>         Attachments: baseline.jpg, next.png
>
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> The current GenerationPerformance benchmark runs all the generators to 
> collect timing data. However the act of running the test within JMH takes 
> some time (test overhead). This overhead should be measured and subtracted 
> from the timing data to create a time attributed only to the method exercised 
> in the RNG.
> This can be done by creating a dummy RNG that returns a fixed value. The 
> implementation must be done in a manner where the JIT compiler is not able to 
> remove the dummy method from the execution path. This is achieved by 
> returning state variables from the dummy RNG (not final variables).
> Testing can be done using a variable number of iterations and the run-times 
> assessed for linearity. If the run time scale with the number of iterations 
> then the JIT compiler has not removed it from execution. The dummy RNG then 
> serves as a baseline for comparison of true implementations.
> This idea with examples is shown in 
> [RNG-87|https://issues.apache.org/jira/browse/RNG-87] which tested a variant 
> of the MWC_256 algorithm.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (RNG-88) Update the GenerationPerformance benchmark

Reply via email to