[ 
https://issues.apache.org/jira/browse/RNG-50?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16560319#comment-16560319
 ] 

Alex D Herbert commented on RNG-50:
-----------------------------------

{quote}Would it be good enough for your use-case?{quote}

Yes. That is what I mocked up in the simple test attached to this ticket. The 
effect is that repeated single use of the {{PoissonSampler}} is very close to 
re-use with the same sampler.

Off topic: Where would be a good place to provide feedback on the library? For 
example there is no way to programatically obtain the information that is in 
the Javadoc of {{org.apache.commons.rng.simple.RandomSource}}. This could be 
done with e.g:

{code:java}
public enum RandomSource {
    /**
     * Source of randomness is {@link 
org.apache.commons.rng.core.source32.JDKRandom}.
     * <ul>
     *  <li>Native seed type: {@code Long}.</li>
     *  <li>Native seed size: 1.</li>
     * </ul>
     */
    JDK(ProviderBuilder.RandomSourceInternal.JDK) { 
          public int getSeedSize() { return 1; },
     },

     // etc ...
     ;

     public Class<?> getSeedType() { getInternalIdentifier().getSeed(); }
     public abstract int getSeedSize();
}
{code}

There is obviously some overlap with the 
{{ProviderBuilder.RandomSourceInternal}} and it would be good to at least 
expose the arguments required for 

{code:java}
    public static RestorableUniformRandomProvider create(RandomSource source,
                                                         Object seed,
                                                         Object ... data)
{code}

I am only asking as I am creating a seed for the {{Well19937c}} class that I 
can use to create it repeatedly with little cost. If I know the seed length 
then I can store the seed and use it for the constructor. I can then change the 
type of RNG dynamically and appropriately store the correct seed. At the moment 
I would have to create a function to return the seed size and type for each RNG 
I want to support using the information in the Javadocs.

As far as I can see the use of the {{RestorableUniformRandomProvider}} requires 
that you first construct a provider (so paying the seeding cost) and then you 
can restore it to a previous state. So the initialisation with the constructor 
seed is a waste. It would be nice to have a constructor or factory method that 
accepts a {{RandomProviderState}} instead.

I am sure as a I get more familiar with the library I'd like to give more 
feedback so is this the best place for that (i.e. new feature request tickets)?



> PoissonSampler single use speed improvements
> --------------------------------------------
>
>                 Key: RNG-50
>                 URL: https://issues.apache.org/jira/browse/RNG-50
>             Project: Commons RNG
>          Issue Type: Improvement
>    Affects Versions: 1.0
>            Reporter: Alex D Herbert
>            Priority: Minor
>         Attachments: PoissonSamplerTest.java
>
>
> The Sampler architecture of {{org.apache.commons.rng.sampling.distribution}} 
> is nicely written for fast sampling of small dataset sizes. The constructors 
> for the samplers do not check the input parameters are valid for the 
> respective distributions (in contrast to the old 
> {{org.apache.commons.math3.random.distribution}} classes). I assume this is a 
> design choice for speed. Thus most of the samplers can be used within a loop 
> to sample just one value with very little overhead.
> The {{PoissonSampler}} precomputes log factorial numbers upon construction if 
> the mean is above 40. This is done using the {{InternalUtils.FactorialLog}} 
> class. As of version 1.0 this internal class is currently only used in the 
> {{PoissonSampler}}.
> The cache size is limited to 2*PIVOT (where PIVOT=40). But it creates and 
> precomputes the cache every time a PoissonSampler is constructed if the mean 
> is above the PIVOT value.
> Why not create this once in a static block for the PoissonSampler?
> {code:java}
> /** {@code log(n!)}. */
> private static final FactorialLog factorialLog;
>      
> static 
> {
>     factorialLog = FactorialLog.create().withCache((int) (2 * 
> PoissonSampler.PIVOT));
> }
> {code}
> This will make the construction cost of a new {{PoissonSampler}} negligible. 
> If the table is computed dynamically as a static construction method then the 
> overhead will be in the first use. Thus the following call will be much 
> faster:
> {code:java}
> UniformRandomProvider rng = ...;
> int value = new PoissonSampler(rng, 50).sample();
> {code}
> I have tested this modification (see attached file) and the results are:
> {noformat}
> Mean 40  Single construction ( 7330792) vs Loop construction                  
>         (24334724)   (3.319522.2x faster)
> Mean 40  Single construction ( 7330792) vs Loop construction with static 
> FactorialLog ( 7990656)   (1.090013.2x faster)
> Mean 50  Single construction ( 6390303) vs Loop construction                  
>         (19389026)   (3.034132.2x faster)
> Mean 50  Single construction ( 6390303) vs Loop construction with static 
> FactorialLog ( 6146556)   (0.961857.2x faster)
> Mean 60  Single construction ( 6041165) vs Loop construction                  
>         (21337678)   (3.532047.2x faster)
> Mean 60  Single construction ( 6041165) vs Loop construction with static 
> FactorialLog ( 5329129)   (0.882136.2x faster)
> Mean 70  Single construction ( 6064003) vs Loop construction                  
>         (23963516)   (3.951765.2x faster)
> Mean 70  Single construction ( 6064003) vs Loop construction with static 
> FactorialLog ( 5306081)   (0.875013.2x faster)
> Mean 80  Single construction ( 6064772) vs Loop construction                  
>         (26381365)   (4.349935.2x faster)
> Mean 80  Single construction ( 6064772) vs Loop construction with static 
> FactorialLog ( 6341274)   (1.045591.2x faster)
> {noformat}
> Thus the speed improvements would be approximately 3-4 fold for single use 
> Poisson sampling.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to