[jira] [Commented] (RNG-50) PoissonSampler single use speed improvements

Alex D Herbert (JIRA) Tue, 31 Jul 2018 10:11:15 -0700


    [ 
https://issues.apache.org/jira/browse/RNG-50?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16564006#comment-16564006
 ]


Alex D Herbert commented on RNG-50:
-----------------------------------

I've built a simple test using JHM based on 
{{org.apache.commons.rng.sampling.distribution.PoissonSamplersPerformance}} in 
the commons-rng-examples-jmh project.

I note that in that class the {{Sources}} nested class returns the 
{{UniformRandomProvider}} without resetting the state. For consistency I 
changed it to:
{code:java}
/**
 * The benchmark state (retrieve the various "RandomSource"s).
 */
@State(Scope.Benchmark)
public static class Sources {
    /**
     * RNG providers.
     */
    @Param({
            "WELL_19937_C",
            "WELL_44497_B",
            "SPLIT_MIX_64"
            // and the rest ...
        })
    private String randomSourceName;

    /** RNG. */
    private RestorableUniformRandomProvider generator;

    /**
     * The state of the generator at the start of the test (for reproducible 
results).
     */
    private RandomProviderState state;

    /**
     * @return the RNG.
     */
    public UniformRandomProvider getGenerator() {
        generator.restoreState(state);
        return generator;
    }

    /** Instantiates generator. */
    @Setup
    public void setup() {
        final RandomSource randomSource = 
RandomSource.valueOf(randomSourceName);
        generator = RandomSource.create(randomSource);
        state = generator.saveState();
    }
}
{code}
I've done a simple run with a small mean and large mean Poisson sampler and one 
that appropriately wraps them. I'll repeat this later with more loops and 
different RNGs but here are the initial results.
||Benchmark||Mode||Threads||Samples||Score||Score Error (99.9%)||Unit||Param: 
mean||Param: randomSourceName||
|LargeMeanPoissonSampler|avgt|1|5|2009.632668|263.007827|us/op|40.3|SPLIT_MIX_64|
|LargeMeanPoissonSampler|avgt|1|5|1976.939758|40.980963|us/op|57.9|SPLIT_MIX_64|
|WrappedPoissonSampler|avgt|1|5|1932.500790|13.686808|us/op|40.3|SPLIT_MIX_64|
|WrappedPoissonSampler|avgt|1|5|1983.504209|21.960997|us/op|57.9|SPLIT_MIX_64|
|PoissonSampler|avgt|1|5|4691.596605|87.865287|us/op|40.3|SPLIT_MIX_64|
|PoissonSampler|avgt|1|5|4742.322853|14.968683|us/op|57.9|SPLIT_MIX_64|
|WrappedPoissonSampler|avgt|1|5|295.546146|1.960616|us/op|5.3|SPLIT_MIX_64|
|WrappedPoissonSampler|avgt|1|5|363.830951|1.737449|us/op|8.5|SPLIT_MIX_64|
|WrappedPoissonSampler|avgt|1|5|972.160790|9.791746|us/op|35.7|SPLIT_MIX_64|
|PoissonSampler|avgt|1|5|685.555186|1.072801|us/op|5.3|SPLIT_MIX_64|
|PoissonSampler|avgt|1|5|753.809844|1.787944|us/op|8.5|SPLIT_MIX_64|
|PoissonSampler|avgt|1|5|1342.103690|8.041261|us/op|35.7|SPLIT_MIX_64|
|SmallMeanPoissonSampler|avgt|1|5|279.659204|1.052956|us/op|5.3|SPLIT_MIX_64|
|SmallMeanPoissonSampler|avgt|1|5|352.463058|1.448008|us/op|8.5|SPLIT_MIX_64|
|SmallMeanPoissonSampler|avgt|1|5|965.718523|4.353908|us/op|35.7|SPLIT_MIX_64|

Rearranged:
||Source||Mean||Name||Relative Score||
|SPLIT_MIX_64|5.3|PoissonSampler|1|
|SPLIT_MIX_64|5.3|WrappedPoissonSampler|0.431104821370281|
|SPLIT_MIX_64|5.3|SmallMeanPoissonSampler|0.407930987484354|
|SPLIT_MIX_64|8.5|PoissonSampler|1|
|SPLIT_MIX_64|8.5|WrappedPoissonSampler|0.482656141858503|
|SPLIT_MIX_64|8.5|SmallMeanPoissonSampler|0.467575557424002|
|SPLIT_MIX_64|35.7|PoissonSampler|1|
|SPLIT_MIX_64|35.7|WrappedPoissonSampler|0.724355947490168|
|SPLIT_MIX_64|35.7|SmallMeanPoissonSampler|0.719555821353863|
|SPLIT_MIX_64|40.3|PoissonSampler|1|
|SPLIT_MIX_64|40.3|LargeMeanPoissonSampler|0.428347284985726|
|SPLIT_MIX_64|40.3|WrappedPoissonSampler|0.411906852336892|
|SPLIT_MIX_64|57.9|PoissonSampler|1|
|SPLIT_MIX_64|57.9|WrappedPoissonSampler|0.41825583590228|
|SPLIT_MIX_64|57.9|LargeMeanPoissonSampler|0.416871608973097|

> PoissonSampler single use speed improvements
> --------------------------------------------
>
>                 Key: RNG-50
>                 URL: https://issues.apache.org/jira/browse/RNG-50
>             Project: Commons RNG
>          Issue Type: Improvement
>    Affects Versions: 1.0
>            Reporter: Alex D Herbert
>            Priority: Minor
>         Attachments: PoissonSamplerTest.java
>
>
> The Sampler architecture of {{org.apache.commons.rng.sampling.distribution}} 
> is nicely written for fast sampling of small dataset sizes. The constructors 
> for the samplers do not check the input parameters are valid for the 
> respective distributions (in contrast to the old 
> {{org.apache.commons.math3.random.distribution}} classes). I assume this is a 
> design choice for speed. Thus most of the samplers can be used within a loop 
> to sample just one value with very little overhead.
> The {{PoissonSampler}} precomputes log factorial numbers upon construction if 
> the mean is above 40. This is done using the {{InternalUtils.FactorialLog}} 
> class. As of version 1.0 this internal class is currently only used in the 
> {{PoissonSampler}}.
> The cache size is limited to 2*PIVOT (where PIVOT=40). But it creates and 
> precomputes the cache every time a PoissonSampler is constructed if the mean 
> is above the PIVOT value.
> Why not create this once in a static block for the PoissonSampler?
> {code:java}
> /** {@code log(n!)}. */
> private static final FactorialLog factorialLog;
>      
> static 
> {
>     factorialLog = FactorialLog.create().withCache((int) (2 * 
> PoissonSampler.PIVOT));
> }
> {code}
> This will make the construction cost of a new {{PoissonSampler}} negligible. 
> If the table is computed dynamically as a static construction method then the 
> overhead will be in the first use. Thus the following call will be much 
> faster:
> {code:java}
> UniformRandomProvider rng = ...;
> int value = new PoissonSampler(rng, 50).sample();
> {code}
> I have tested this modification (see attached file) and the results are:
> {noformat}
> Mean 40  Single construction ( 7330792) vs Loop construction                  
>         (24334724)   (3.319522.2x faster)
> Mean 40  Single construction ( 7330792) vs Loop construction with static 
> FactorialLog ( 7990656)   (1.090013.2x faster)
> Mean 50  Single construction ( 6390303) vs Loop construction                  
>         (19389026)   (3.034132.2x faster)
> Mean 50  Single construction ( 6390303) vs Loop construction with static 
> FactorialLog ( 6146556)   (0.961857.2x faster)
> Mean 60  Single construction ( 6041165) vs Loop construction                  
>         (21337678)   (3.532047.2x faster)
> Mean 60  Single construction ( 6041165) vs Loop construction with static 
> FactorialLog ( 5329129)   (0.882136.2x faster)
> Mean 70  Single construction ( 6064003) vs Loop construction                  
>         (23963516)   (3.951765.2x faster)
> Mean 70  Single construction ( 6064003) vs Loop construction with static 
> FactorialLog ( 5306081)   (0.875013.2x faster)
> Mean 80  Single construction ( 6064772) vs Loop construction                  
>         (26381365)   (4.349935.2x faster)
> Mean 80  Single construction ( 6064772) vs Loop construction with static 
> FactorialLog ( 6341274)   (1.045591.2x faster)
> {noformat}
> Thus the speed improvements would be approximately 3-4 fold for single use 
> Poisson sampling.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (RNG-50) PoissonSampler single use speed improvements

Reply via email to