[jira] [Comment Edited] (RNG-50) PoissonSampler single use speed improvements

Gilles (JIRA) Fri, 27 Jul 2018 03:38:14 -0700


    [ 
https://issues.apache.org/jira/browse/RNG-50?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16559070#comment-16559070
 ]


Gilles edited comment on RNG-50 at 7/27/18 10:37 AM:
-----------------------------------------------------

{quote}It doesn't check the RNG is not null.
{quote}
Passing {{null}} is a programming error, and the check is going to be performed 
by the JVM; since the call is quite low-level, the NPE will be thrown fairly 
close to the error; so indeed no need (IMHO) to duplicate such checks.
{quote}AhrensDieterMarsagliaTsangGammaSampler where the constructors params are 
not checked.
{quote}
Oh. That was not intentional; I'd be inclined to introduce sanity checks. ;)
 Do you think that it would make such a big difference?
{quote}use case of the Poisson sampling is to model photon shot noise on a 
camera image, e.g 512x512 is 262,144 pixels. Each pixel value is likely to be 
unique during a simulation that creates floating-point images.
{quote}
Interesting!
 I don't quite understand why "each pixel value is likely to be unique" entails 
that the sampler is to be thrown away after a single call to the {{sample()}} 
method. Is the {{mean}} different for each pixel and frame?
 I'm asking because if there is no other way than instantiating an sampler for 
a single sample, the API is not good for that: there is more overhead than the 
factorial cache (instantiation of 2 other objects). A new class should provide 
a {{sample(double mean)}} method (It would not fit the {{DiscreteSampler}} API).
{quote}This is a sub-optimal cache.
{quote}
I know; it was under the assumption that the number of calls to {{sample()}}, 
with all its non-trivial computations, would generally overwhelm the running 
time.
{quote}Subsequent uses would not suffer.
{quote}
Sure, I agree. But the concern was to avoid multi-thread complication.
{quote}A better solution [...] LogFactorial [...]
{quote}
Using {{synchronized}} has a performance impact.
 I'm no multi-thread expert but I suspect that your code does not synchronize 
consistently: Couldn't
{code:java}
getLogF(n){code}
read the table before a new value has been written to slot "n", thus returning 
0?
{quote}The master table can be increased in size and reduced.
{quote}
When would one want to reduce it?
{quote}If the FactorialLog class is modified then when it is used by other 
samplers then they can take advantage of any already computed values.
{quote}
Sure. But is the performance gain worth the added complexity? And how much is 
that gain? Aren't there cases where there is a loss in performance due to the 
synchronization?
 Wouldn't it be sufficient to define a constructor where the (precomputed, 
immutable) cache is passed as an argument?


was (Author: erans):
{quote}It doesn't check the RNG is not null.
{quote}
Passing {{null }}is a programming error, and the check is going to be performed 
by the JVM; since the call is quite low-level, the NPE will be thrown fairly 
close to the error; so indeed no need (IMHO) to duplicate such checks.
{quote}AhrensDieterMarsagliaTsangGammaSampler where the constructors params are 
not checked.
{quote}
Oh. That was not intentional; I'd be inclined to introduce sanity checks. ;)
 Do you think that it would make such a big difference?
{quote}use case of the Poisson sampling is to model photon shot noise on a 
camera image, e.g 512x512 is 262,144 pixels. Each pixel value is likely to be 
unique during a simulation that creates floating-point images.
{quote}
Interesting!
 I don't quite understand why "each pixel value is likely to be unique" entails 
that the sampler is to be thrown away after a single call to the {{sample()}} 
method. Is the {{mean}} different for each pixel and frame?
 I'm asking because if there is no other way than instantiating an sampler for 
a single sample, the API is not good for that: there is more overhead than the 
factorial cache (instantiation of 2 other objects). A new class should provide 
a {{sample(double mean)}} method (It would not fit the {{DiscreteSampler}} API).
{quote}This is a sub-optimal cache.
{quote}
I know; it was under the assumption that the number of calls to {{sample()}}, 
with all its non-trivial computations, would generally overwhelm the running 
time.
{quote}Subsequent uses would not suffer.
{quote}
Sure, I agree. But the concern was to avoid multi-thread complication.
{quote}A better solution [...] LogFactorial [...]
{quote}
Using {{synchronized}} has a performance impact.
 I'm no multi-thread expert but I suspect that your code does not synchronize 
consistently: Couldn't
{code:java}
getLogF(n){code}
read the table before a new value has been written to slot "n", thus returning 
0?
{quote}The master table can be increased in size and reduced.
{quote}
When would one want to reduce it?
{quote}If the FactorialLog class is modified then when it is used by other 
samplers then they can take advantage of any already computed values.
{quote}
Sure. But is the performance gain worth the added complexity? And how much is 
that gain? Aren't there cases where there is a loss in performance due to the 
synchronization?
 Wouldn't it be sufficient to define a constructor where the (precomputed, 
immutable) cache is passed as an argument?

> PoissonSampler single use speed improvements
> --------------------------------------------
>
>                 Key: RNG-50
>                 URL: https://issues.apache.org/jira/browse/RNG-50
>             Project: Commons RNG
>          Issue Type: Improvement
>    Affects Versions: 1.0
>            Reporter: Alex D Herbert
>            Priority: Minor
>         Attachments: PoissonSamplerTest.java
>
>
> The Sampler architecture of {{org.apache.commons.rng.sampling.distribution}} 
> is nicely written for fast sampling of small dataset sizes. The constructors 
> for the samplers do not check the input parameters are valid for the 
> respective distributions (in contrast to the old 
> {{org.apache.commons.math3.random.distribution}} classes). I assume this is a 
> design choice for speed. Thus most of the samplers can be used within a loop 
> to sample just one value with very little overhead.
> The {{PoissonSampler}} precomputes log factorial numbers upon construction if 
> the mean is above 40. This is done using the {{InternalUtils.FactorialLog}} 
> class. As of version 1.0 this internal class is currently only used in the 
> {{PoissonSampler}}.
> The cache size is limited to 2*PIVOT (where PIVOT=40). But it creates and 
> precomputes the cache every time a PoissonSampler is constructed if the mean 
> is above the PIVOT value.
> Why not create this once in a static block for the PoissonSampler?
> {code:java}
> /** {@code log(n!)}. */
> private static final FactorialLog factorialLog;
>      
> static 
> {
>     factorialLog = FactorialLog.create().withCache((int) (2 * 
> PoissonSampler.PIVOT));
> }
> {code}
> This will make the construction cost of a new {{PoissonSampler}} negligible. 
> If the table is computed dynamically as a static construction method then the 
> overhead will be in the first use. Thus the following call will be much 
> faster:
> {code:java}
> UniformRandomProvider rng = ...;
> int value = new PoissonSampler(rng, 50).sample();
> {code}
> I have tested this modification (see attached file) and the results are:
> {noformat}
> Mean 40  Single construction ( 7330792) vs Loop construction                  
>         (24334724)   (3.319522.2x faster)
> Mean 40  Single construction ( 7330792) vs Loop construction with static 
> FactorialLog ( 7990656)   (1.090013.2x faster)
> Mean 50  Single construction ( 6390303) vs Loop construction                  
>         (19389026)   (3.034132.2x faster)
> Mean 50  Single construction ( 6390303) vs Loop construction with static 
> FactorialLog ( 6146556)   (0.961857.2x faster)
> Mean 60  Single construction ( 6041165) vs Loop construction                  
>         (21337678)   (3.532047.2x faster)
> Mean 60  Single construction ( 6041165) vs Loop construction with static 
> FactorialLog ( 5329129)   (0.882136.2x faster)
> Mean 70  Single construction ( 6064003) vs Loop construction                  
>         (23963516)   (3.951765.2x faster)
> Mean 70  Single construction ( 6064003) vs Loop construction with static 
> FactorialLog ( 5306081)   (0.875013.2x faster)
> Mean 80  Single construction ( 6064772) vs Loop construction                  
>         (26381365)   (4.349935.2x faster)
> Mean 80  Single construction ( 6064772) vs Loop construction with static 
> FactorialLog ( 6341274)   (1.045591.2x faster)
> {noformat}
> Thus the speed improvements would be approximately 3-4 fold for single use 
> Poisson sampling.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Comment Edited] (RNG-50) PoissonSampler single use speed improvements

Reply via email to