[
https://issues.apache.org/jira/browse/RNG-50?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16559674#comment-16559674
]
Alex D Herbert commented on RNG-50:
-----------------------------------
{quote}I'd be inclined to introduce sanity checks. Do you think that it would
make such a big difference?{quote}
The sanity checks are likely to be small compared to the cost of even one
random sample. I just mentioned it as I thought it could have been a design
choice.
I agree that passing a null RNG into the method is wrong. You'll get a null
pointer exception on first use. I'm generally happy with that. I'll leave it to
you to decide based on the rest of your framework. If you do not check it then
you could document the fact.
{quote}I don't quite understand why "each pixel value is likely to be unique"
entails that the sampler is to be thrown away after a single call to the
sample() method. Is the mean different for each pixel and frame?{quote}
Yes, the mean is different for each pixel. Imagine a perfect image of anything
in the real world. The pixel values are all likely to be different.
{quote}I'm no multi-thread expert but I suspect that your code does not
synchronize consistently: Couldn't
{code:java}
getLogF(n)
{code}
read the table before a new value has been written to slot "n", thus returning
0?{quote}
The idea is that the table is only ever modified inside a synchronised block
using the same locking object. I've rewritten it to use a
{{java.util.concurrent.locks.ReadWriteLock}} to guard access instead with
better documentation of what is happening.
I've also used {{volatile}} to ensure threads have the same value of the table.
This code is an example I put together for functions that use a range of
{{log(n!)}} values. The range is likely to be between 0 and 65535 (the range of
an unsigned 16-bit image). However not all the range is required at any one
time so pre-computing up to 65535 is an overhead I removed by adding the
{{ensureRange(int,int)}} function.
There are several caveats when using the class that I hope to have now
documented. The design is not sensible outside of my own use case where the
range is known and the function is used across many classes. For more general
use then perhaps the safest is back to allowing only immutable instances to be
created, perhaps with a static internal cache of a limited size that can be
reused when creating instances.
{quote}A new class should provide a sample(double mean) method (It would not
fit the DiscreteSampler API).{quote}
I'd be OK with that. I use custom version of Commons Math
{org.apache.commons.math3.random.distribution} classes that do just that at the
moment anyway. They've just been modified to allow the constructor arguments to
be set as properties.
{quote}Wouldn't it be sufficient to define a constructor where the
(precomputed, immutable) cache is passed as an argument?{quote}
That would also be a solution. But then you have to expose the cache API.
Perhaps use the {{with}} pattern to add a method such as
{code:java}
public PoissonSampler with(double mean)
{
return new PoissonSampler(this.getUniformRandomProvider(), mean,
this.factorialLog);
}
{code}
But then the {{SamplerBase#rng}} is private so this needs to be exposed. And
currently there is no method to duplicate it so it would be a reference to the
same object.
I still think that the simple solution of a one time cache construction inside
{{PoissonSampler}} is the best option:
{code:java}
/** {@code log(n!)}. */
private static final FactorialLog factorialLog;
static
{
factorialLog = FactorialLog.create().withCache((int) (2 *
PoissonSampler.PIVOT));
}
{code}
> PoissonSampler single use speed improvements
> --------------------------------------------
>
> Key: RNG-50
> URL: https://issues.apache.org/jira/browse/RNG-50
> Project: Commons RNG
> Issue Type: Improvement
> Affects Versions: 1.0
> Reporter: Alex D Herbert
> Priority: Minor
> Attachments: PoissonSamplerTest.java
>
>
> The Sampler architecture of {{org.apache.commons.rng.sampling.distribution}}
> is nicely written for fast sampling of small dataset sizes. The constructors
> for the samplers do not check the input parameters are valid for the
> respective distributions (in contrast to the old
> {{org.apache.commons.math3.random.distribution}} classes). I assume this is a
> design choice for speed. Thus most of the samplers can be used within a loop
> to sample just one value with very little overhead.
> The {{PoissonSampler}} precomputes log factorial numbers upon construction if
> the mean is above 40. This is done using the {{InternalUtils.FactorialLog}}
> class. As of version 1.0 this internal class is currently only used in the
> {{PoissonSampler}}.
> The cache size is limited to 2*PIVOT (where PIVOT=40). But it creates and
> precomputes the cache every time a PoissonSampler is constructed if the mean
> is above the PIVOT value.
> Why not create this once in a static block for the PoissonSampler?
> {code:java}
> /** {@code log(n!)}. */
> private static final FactorialLog factorialLog;
>
> static
> {
> factorialLog = FactorialLog.create().withCache((int) (2 *
> PoissonSampler.PIVOT));
> }
> {code}
> This will make the construction cost of a new {{PoissonSampler}} negligible.
> If the table is computed dynamically as a static construction method then the
> overhead will be in the first use. Thus the following call will be much
> faster:
> {code:java}
> UniformRandomProvider rng = ...;
> int value = new PoissonSampler(rng, 50).sample();
> {code}
> I have tested this modification (see attached file) and the results are:
> {noformat}
> Mean 40 Single construction ( 7330792) vs Loop construction
> (24334724) (3.319522.2x faster)
> Mean 40 Single construction ( 7330792) vs Loop construction with static
> FactorialLog ( 7990656) (1.090013.2x faster)
> Mean 50 Single construction ( 6390303) vs Loop construction
> (19389026) (3.034132.2x faster)
> Mean 50 Single construction ( 6390303) vs Loop construction with static
> FactorialLog ( 6146556) (0.961857.2x faster)
> Mean 60 Single construction ( 6041165) vs Loop construction
> (21337678) (3.532047.2x faster)
> Mean 60 Single construction ( 6041165) vs Loop construction with static
> FactorialLog ( 5329129) (0.882136.2x faster)
> Mean 70 Single construction ( 6064003) vs Loop construction
> (23963516) (3.951765.2x faster)
> Mean 70 Single construction ( 6064003) vs Loop construction with static
> FactorialLog ( 5306081) (0.875013.2x faster)
> Mean 80 Single construction ( 6064772) vs Loop construction
> (26381365) (4.349935.2x faster)
> Mean 80 Single construction ( 6064772) vs Loop construction with static
> FactorialLog ( 6341274) (1.045591.2x faster)
> {noformat}
> Thus the speed improvements would be approximately 3-4 fold for single use
> Poisson sampling.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)