[jira] [Comment Edited] (RNG-50) PoissonSampler single use speed improvements

Alex D Herbert (JIRA) Wed, 01 Aug 2018 05:36:12 -0700


    [ 
https://issues.apache.org/jira/browse/RNG-50?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16565206#comment-16565206
 ]


Alex D Herbert edited comment on RNG-50 at 8/1/18 12:35 PM:
------------------------------------------------------------

{quote}However what to do with your use-case (large mean single use)?
{quote}
We know that when the mean is between 40 and 80 then the {{log(n!)}} is used 
approximately 0.7 to 0.3% of the time within the algorithm loop. When the mean 
is higher this usage within the algorithm drops off.

The log gamma function used to compute {{log(n!)}} uses some double 
computations in a loop of size 15 and then 2 {{Math.log}} calls. A cache of 
just the {{log(n!)}} values is unlikely to make much of a speed difference 
inside the loop as there a fair few other things happening including Gaussian 
and Exponential sampling using the RNG. (But of course a JMH benchmark would 
avoid presumptions.)

However it will also be used once per sample at the value of {{(int)mean}} at 
the initialisation of the algorithm.

The other pre-computed values for the Poisson sample use 2 {{Math.exp}}, 2 
{{Math.log}} calls and 2 {{Math.sqrt}} calls. All of these values are based 
purely on the integer value of the mean.

So there may be some value in passing in a cache of not only {{log(n!)}} but 
also the entire initialisation state of the LargeMeanPoissonSampler if the 
range of the mean is known.

The cache could perform lazy initialisation something like:
{code:java}
package org.apache.commons.rng.sampling.distribution;

import org.apache.commons.rng.sampling.distribution.InternalUtils.FactorialLog;

/**
 * Compute initialisation state for the LargeMeanPoissonSampler caching values
 * in a set range.
 */
public class LargeMeanPoissonSamplerCache {

    /** Class to compute {@code log(n!)}. This has no cached values. */
    private static final InternalUtils.FactorialLog NO_CACHE_FACTORIAL_LOG;

    static {
        NO_CACHE_FACTORIAL_LOG = FactorialLog.create();
    }

    /** The minimum N covered by the cache. */
    private final int minN;
    /** The maximum N covered by the cache. */
    private final int maxN;
    /** The cache of initialisation states between {@link minN} and {@link 
maxN}. */
    private final double[][] values;

    /**
     * @param minN The minimum N covered by the cache.
     * @param maxN The maximum N covered by the cache.
     * @throws IllegalArgumentException if {@code mean <= 0}.
     */
    public LargeMeanPoissonSamplerCache(int minN, int maxN) {
        if (minN < 0) {
            throw new IllegalArgumentException("MinN: " + minN + " <= " + 0);
        }
        if (maxN <= minN) {
            throw new IllegalArgumentException("MaxN: " + maxN + " <= " + minN);
        }
        this.minN = minN;
        this.maxN = maxN;
        values = new double[maxN - minN + 1][];
    }

    /**
     * Get the initialisation state of the LargeMeanPoissonSampler.
     *
     * @param lambda The integer value of the Poisson mean 
     *               ({@code Math.floor(mean)}).
     * @return the initialisation state
     */
    double[] getState(int lambda) {
        if (lambda < minN || lambda > maxN)
            return computeState(lambda);
        final int index = lambda - minN;
        double[] value = values[index];
        if (value == null) {
            // Compute and store for reuse
            value = computeState(lambda);
            values[index] = value;
        }
        return value;
    }
    
    /**
     * Compute the initialisation state of the LargeMeanPoissonSampler.
     *
     * @param lambda The integer value of the Poisson mean ({@code 
Math.floor(mean)}).
     * @return the initialisation state
     */
    private final static double[] computeState(int lambda) {
        final double logLambda = Math.log(lambda);
        final double logLambdaFactorial = NO_CACHE_FACTORIAL_LOG.value(lambda);
        final double delta = Math.sqrt(lambda * Math.log(32 * lambda / Math.PI 
+ 1));
        final double halfDelta = delta / 2;
        final double twolpd = 2 * lambda + delta;
        final double c1 = 1 / (8 * lambda);
        final double a1 = Math.sqrt(Math.PI * twolpd) * Math.exp(c1);
        final double a2 = (twolpd / delta) * Math.exp(-delta * (1 + delta) / 
twolpd);
        final double aSum = a1 + a2 + 1;
        final double p1 = a1 / aSum;
        final double p2 = a2 / aSum;
        return new double[] { logLambda, logLambdaFactorial, delta, halfDelta, 
twolpd, p1, p2, c1 };
    }
}
{code}
This was a simple example using {{double[]}} but the state could be wrapped as 
an immutable object.

The LargeMeanPoissonSampler would then have to do this for initialisation:
{code:java}
lambda = Math.floor(mean);
initialise(cache.getState((int) lambda);
{code}


was (Author: alexherbert):
{quote}However what to do with your use-case (large mean single use)?
{quote}
We know that when the mean is between 40 and 80 then the {{log(n!)}} is used 
approximately 0.7 to 0.3% of the time within the algorithm loop. When the mean 
is higher this usage within the algorithm drops off.

The log gamma function used to compute {{log(n!)}} uses some double 
computations in a loop of size 15 and then 2 {{Math.log}} calls. A cache of 
just the {{log(n!)}} values is unlikely to make much of a speed difference 
inside the loop as there a fair few other things happening including Gaussian 
and Exponential sampling using the RNG. (But of course a JMH benchmark would 
avoid presumptions.)

However it will also be used once per sample at the value of {{(int)mean}} at 
the initialisation of the algorithm.

The other pre-computed values for the Poisson sample use 2 {{Math.exp}}, 2 
{{Math.log}} calls and 2 {{Math.sqrt}} calls. All of these values are based 
purely on the integer value of the mean.

So there may be some value in passing in a cache of not only {{log(n!)}} but 
also the entire initialisation state of the LargeMeanPoissonSampler if the 
range of the mean is known.

The cache could perform lazy initialisation something like:
{code:java}
package org.apache.commons.rng.sampling.distribution;

import org.apache.commons.rng.sampling.distribution.InternalUtils.FactorialLog;

/**
 * Compute initialisation state for the LargeMeanPoissonSampler caching values
 * in a set range.
 */
public class LargeMeanPoissonSamplerCache {

    /** Class to compute {@code log(n!)}. This has no cached values. */
    private static final InternalUtils.FactorialLog NO_CACHE_FACTORIAL_LOG;

    static {
        NO_CACHE_FACTORIAL_LOG = FactorialLog.create();
    }

    /** The minimum N covered by the cache. */
    private final int minN;
    /** The maximum N covered by the cache. */
    private final int maxN;
    /** The cache of initialisation states between {@link minN} and {@link 
maxN}. */
    private final double[][] values;

    /**
     * @param minN The minimum N covered by the cache.
     * @param maxN The maximum N covered by the cache.
     * @throws IllegalArgumentException if {@code mean <= 0}.
     */
    public LargeMeanPoissonSamplerCache(int minN, int maxN) {
        if (minN < 0) {
            throw new IllegalArgumentException("MinN: " + minN + " <= " + 0);
        }
        if (maxN <= minN) {
            throw new IllegalArgumentException("MaxN: " + maxN + " <= " + minN);
        }
        this.minN = minN;
        this.maxN = maxN;
        values = new double[maxN - minN + 1][];
    }

    /**
     * Get the initialisation state of the LargeMeanPoissonSampler.
     *
     * @param lambda The integer value of the Poisson mean ({@code 
Math.floor(mean)}).
     * @return the initialisation state
     */
    double[] getState(int lambda) {
        if (lambda < minN || lambda > maxN)
            return computeState(lambda);
        final int index = lambda - minN;
        double[] value = values[index];
        if (value == null) {
            // Compute and store for reuse
            value = computeState(lambda);
            values[index] = value;
        }
        return value;
    }
    
    /**
     * Compute the initialisation state of the LargeMeanPoissonSampler.
     *
     * @param lambda The integer value of the Poisson mean ({@code 
Math.floor(mean)}).
     * @return the initialisation state
     */
    private final static double[] computeState(int lambda) {
        final double logLambda = Math.log(lambda);
        final double logLambdaFactorial = NO_CACHE_FACTORIAL_LOG.value(lambda);
        final double delta = Math.sqrt(lambda * Math.log(32 * lambda / Math.PI 
+ 1));
        final double halfDelta = delta / 2;
        final double twolpd = 2 * lambda + delta;
        final double c1 = 1 / (8 * lambda);
        final double a1 = Math.sqrt(Math.PI * twolpd) * Math.exp(c1);
        final double a2 = (twolpd / delta) * Math.exp(-delta * (1 + delta) / 
twolpd);
        final double aSum = a1 + a2 + 1;
        final double p1 = a1 / aSum;
        final double p2 = a2 / aSum;
        return new double[] { logLambda, logLambdaFactorial, delta, halfDelta, 
twolpd, p1, p2, c1 };
    }
}
{code}

This was a simple example using {{double[]}} but the state could be wrapped as 
an immutable object.

The LargeMeanPoissonSampler would then have to do this for initialisation:

{code:java}
lambda = Math.floor(mean);
initialise(cache.getState((int) lambda);
{code}

> PoissonSampler single use speed improvements
> --------------------------------------------
>
>                 Key: RNG-50
>                 URL: https://issues.apache.org/jira/browse/RNG-50
>             Project: Commons RNG
>          Issue Type: Improvement
>    Affects Versions: 1.0
>            Reporter: Alex D Herbert
>            Priority: Minor
>         Attachments: PoissonSamplerTest.java, jmh-result.csv
>
>
> The Sampler architecture of {{org.apache.commons.rng.sampling.distribution}} 
> is nicely written for fast sampling of small dataset sizes. The constructors 
> for the samplers do not check the input parameters are valid for the 
> respective distributions (in contrast to the old 
> {{org.apache.commons.math3.random.distribution}} classes). I assume this is a 
> design choice for speed. Thus most of the samplers can be used within a loop 
> to sample just one value with very little overhead.
> The {{PoissonSampler}} precomputes log factorial numbers upon construction if 
> the mean is above 40. This is done using the {{InternalUtils.FactorialLog}} 
> class. As of version 1.0 this internal class is currently only used in the 
> {{PoissonSampler}}.
> The cache size is limited to 2*PIVOT (where PIVOT=40). But it creates and 
> precomputes the cache every time a PoissonSampler is constructed if the mean 
> is above the PIVOT value.
> Why not create this once in a static block for the PoissonSampler?
> {code:java}
> /** {@code log(n!)}. */
> private static final FactorialLog factorialLog;
>      
> static 
> {
>     factorialLog = FactorialLog.create().withCache((int) (2 * 
> PoissonSampler.PIVOT));
> }
> {code}
> This will make the construction cost of a new {{PoissonSampler}} negligible. 
> If the table is computed dynamically as a static construction method then the 
> overhead will be in the first use. Thus the following call will be much 
> faster:
> {code:java}
> UniformRandomProvider rng = ...;
> int value = new PoissonSampler(rng, 50).sample();
> {code}
> I have tested this modification (see attached file) and the results are:
> {noformat}
> Mean 40  Single construction ( 7330792) vs Loop construction                  
>         (24334724)   (3.319522.2x faster)
> Mean 40  Single construction ( 7330792) vs Loop construction with static 
> FactorialLog ( 7990656)   (1.090013.2x faster)
> Mean 50  Single construction ( 6390303) vs Loop construction                  
>         (19389026)   (3.034132.2x faster)
> Mean 50  Single construction ( 6390303) vs Loop construction with static 
> FactorialLog ( 6146556)   (0.961857.2x faster)
> Mean 60  Single construction ( 6041165) vs Loop construction                  
>         (21337678)   (3.532047.2x faster)
> Mean 60  Single construction ( 6041165) vs Loop construction with static 
> FactorialLog ( 5329129)   (0.882136.2x faster)
> Mean 70  Single construction ( 6064003) vs Loop construction                  
>         (23963516)   (3.951765.2x faster)
> Mean 70  Single construction ( 6064003) vs Loop construction with static 
> FactorialLog ( 5306081)   (0.875013.2x faster)
> Mean 80  Single construction ( 6064772) vs Loop construction                  
>         (26381365)   (4.349935.2x faster)
> Mean 80  Single construction ( 6064772) vs Loop construction with static 
> FactorialLog ( 6341274)   (1.045591.2x faster)
> {noformat}
> Thus the speed improvements would be approximately 3-4 fold for single use 
> Poisson sampling.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Comment Edited] (RNG-50) PoissonSampler single use speed improvements

Reply via email to