[ 
https://issues.apache.org/jira/browse/RNG-91?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16832522#comment-16832522
 ] 

Alex D Herbert commented on RNG-91:
-----------------------------------

The notes for the Alias method page on wikipedia provides an alternative faster 
sampling method for specially prepared probability distributions. The method is 
from:
{noformat}
George Marsaglia, Wai Wan Tsang, Jingbo Wang (2004) Fast Generation of Discrete 
Random Variables. Journal of Statistical Software. Vol. 11, Issue. 3, pp. 1-11.
{noformat}
[JSS Vol 11, Issue 3|http://dx.doi.org/10.18637/jss.v011.i03]

The paper gives a general method that requires input probabilities expressed as 
unsigned 30 bit integers. The probabilities thus have an assumed divisor of 
2^30. The probabilities must sum to 2^30. These are then efficiently tabulated 
for fast look-up using 5 base-64 digits from the 30-bit number. Sampling then 
takes 1 random integer per sample.

The paper provides software to compute the probability distributions for 
Poisson and Binomial and the look-up tables. Here are the results for the 
method against the current method of choice for the 8 previously listed use 
cases.

Note: In these results I have subtracted the JMH overhead for timing the 
generation of an int value. This is approximately 2 nanoseconds. I did not do 
this for previous results and the result is each method gets faster (by 2ns), 
and relative differences between them is slightly bigger. Relative differences 
are important for the analysis of the number of additional samples after 
construction before a performance gain.
|randomSourceName|mean|samplerType|Method|Score|Error|Median|
|WELL_44497_B|0.500|MarsagliaTsangWangPoissonSampler|sample|11.355|0.034|11.359|
|WELL_44497_B|0.500|OriginalKempSmallMeanPoissonSampler|sample|23.195|0.029|23.197|
|WELL_44497_B|0.500|OriginalKempSmallMeanPoissonSampler|singleSample|70.410|0.382|70.389|
|WELL_44497_B|0.500|MarsagliaTsangWangPoissonSampler|singleSample|833.884|5.302|834.211|
|WELL_44497_B|2.000|MarsagliaTsangWangPoissonSampler|sample|12.877|0.053|12.876|
|WELL_44497_B|2.000|OriginalKempSmallMeanPoissonSampler|sample|36.199|17.786|34.115|
|WELL_44497_B|2.000|OriginalKempSmallMeanPoissonSampler|singleSample|78.477|0.573|78.407|
|WELL_44497_B|2.000|MarsagliaTsangWangPoissonSampler|singleSample|1129.386|5.113|1128.672|
|WELL_44497_B|32.000|MarsagliaTsangWangPoissonSampler|sample|13.290|0.070|13.285|
|WELL_44497_B|32.000|KempSmallMeanPoissonSamplerGuideTable|sample|23.371|0.028|23.374|
|WELL_44497_B|32.000|OriginalKempSmallMeanPoissonSampler|singleSample|204.599|0.959|204.693|
|WELL_44497_B|32.000|MarsagliaTsangWangPoissonSampler|singleSample|4570.998|42.982|4566.936|
|XO_RO_SHI_RO_128_PLUS|0.500|MarsagliaTsangWangPoissonSampler|sample|4.006|0.142|3.993|
|XO_RO_SHI_RO_128_PLUS|0.500|OriginalKempSmallMeanPoissonSampler|sample|9.107|0.073|9.109|
|XO_RO_SHI_RO_128_PLUS|0.500|OriginalKempSmallMeanPoissonSampler|singleSample|54.583|0.122|54.594|
|XO_RO_SHI_RO_128_PLUS|0.500|MarsagliaTsangWangPoissonSampler|singleSample|948.870|358.101|960.837|
|XO_RO_SHI_RO_128_PLUS|2.000|MarsagliaTsangWangPoissonSampler|sample|4.150|0.005|4.152|
|XO_RO_SHI_RO_128_PLUS|2.000|OriginalKempSmallMeanPoissonSampler|sample|17.328|0.128|17.328|
|XO_RO_SHI_RO_128_PLUS|2.000|SmallMeanPoissonSampler|singleSample|57.654|0.051|57.657|
|XO_RO_SHI_RO_128_PLUS|2.000|MarsagliaTsangWangPoissonSampler|singleSample|1129.653|1.310|1129.442|
|XO_RO_SHI_RO_128_PLUS|32.000|MarsagliaTsangWangPoissonSampler|sample|5.838|0.012|5.842|
|XO_RO_SHI_RO_128_PLUS|32.000|KempSmallMeanPoissonSamplerGuideTable|sample|10.097|5.697|9.423|
|XO_RO_SHI_RO_128_PLUS|32.000|SmallMeanPoissonSampler|singleSample|117.201|0.128|117.187|
|XO_RO_SHI_RO_128_PLUS|32.000|MarsagliaTsangWangPoissonSampler|singleSample|4550.316|11.805|4549.130|

So this method is extremely fast as expected given it require a single int per 
sample. It does however take quite a long time to create the sampler. So how 
long to pay that cost back verses a fast to create sampler:
|randomSourceName|Sampler|Mean|c|a|Sampler|d|b|x|
|WELL_44497_B|OriginalKempSmallMeanPoissonSampler|0.50|70.41|23.20|MarsagliaTsangWangPoissonSampler|833.88|11.35|64.48|
|XO_RO_SHI_RO_128_PLUS|OriginalKempSmallMeanPoissonSampler|0.50|54.58|9.11|MarsagliaTsangWangPoissonSampler|948.87|4.01|175.33|
|WELL_44497_B|OriginalKempSmallMeanPoissonSampler|2.00|78.48|36.20|MarsagliaTsangWangPoissonSampler|1,129.39|12.88|45.06|
|XO_RO_SHI_RO_128_PLUS|OriginalKempSmallMeanPoissonSampler|2.00|57.65|17.33|MarsagliaTsangWangPoissonSampler|1,129.65|4.15|81.35|
|WELL_44497_B|OriginalKempSmallMeanPoissonSampler|32.00|204.60|159.27|MarsagliaTsangWangPoissonSampler|4,571.00|13.29|29.91|
|XO_RO_SHI_RO_128_PLUS|SmallMeanPoissonSampler|32.00|117.20|89.74|MarsagliaTsangWangPoissonSampler|4,550.32|5.84|52.84|

So this takes longer to pay back than using the guide table sampler. But is 
worth doing for any simulation of more than a few dozen samples. Note: The 
cut-off point is earlier when the sampler is slow.

Given the performance of this new sampler it removes the use-case for the guide 
table Poisson sampler (or anything else I have tested). That had lower set-up 
costs but cannot match the performance as it requires 1 double per sampler 
rather than 1 int.

I will raise a PR for this new sampler.

*Note on storage requirements*

The method employs 5 look-up tables. Each table holds the sample index. So 
tables can be either byte, short or int data type depending on the maximum 
number of samples (2^8, 2^16, or 2^30). The maximum number of samples is 2^30 
when the probability of each is 1/2^30. The worst case scenario for storage is 
a uniform distribution of the maximum sample size. I compute this is capped at 
0.06MB for 2^8, 17.0MB for 2^16, and 4.3GB for n <= 2^30. In reality the 
requirements are much lower.

Here are the storage requirements for example Poisson distributions:
||mean||table size||kB||
|0.25|882|0.88|
|0.5|1135|1.14|
|1|1200|1.20|
|2|1451|1.45|
|4|1955|1.96|
|8|2961|2.96|
|16|4410|4.41|
|32|6115|6.11|
|64|8499|8.50|
|128|11528|11.53|
|256|15935|15.94|
|512|20912|41.82|
|1024|30614|61.23|

The shift from 1 to 2 byte storage occurs at mean 512.

Note: The methods in the paper support a Poisson mean up to 1941 before 
numerical error. So this is suitable as a SmallMean Poisson sampler. A cap of 
1024 for the mean appears sensible.

 

> Kemp small mean poisson sampler
> -------------------------------
>
>                 Key: RNG-91
>                 URL: https://issues.apache.org/jira/browse/RNG-91
>             Project: Commons RNG
>          Issue Type: New Feature
>          Components: sampling
>    Affects Versions: 1.3
>            Reporter: Alex D Herbert
>            Assignee: Alex D Herbert
>            Priority: Minor
>             Fix For: 1.3
>
>         Attachments: kemp.jpg, poisson-samplers.jpg
>
>          Time Spent: 1h
>  Remaining Estimate: 0h
>
> The current algorithm for the {{SmallMeanPoissonSampler}} is used to generate 
> Poisson samples for any mean up to 40. The sampler requires approximately n 
> samples from a RNG to generate 1 Poisson deviate for a mean of n.
> The [Kemp (1981)|https://www.jstor.org/stable/2346348] algorithm requires 1 
> sample from the RNG and then accrues a cumulative probability using a 
> recurrence relation to compute each successive Poisson probability:
> {noformat}
> p(n+1) = p(n) * mean / (n+1)
> {noformat}
> The full algorithm is here:
> {code:java}
>     mean = ...;
>     final double p0 = Math.exp(-mean);
>     @Override
>     public int sample() {
>         double u = rng.nextDouble();
>         int x = 0;
>         double p = p0;
>         // The algorithm listed in Kemp (1981) does not check that the 
> rolling probability
>         // is positive. This check is added to ensure no errors when the 
> limit of the summation
>         // 1 - sum(p(x)) is above 0 due to cumulative error in floating point 
> arithmetic.
>         while (u > p && p != 0) {
>             u -= p;
>             x++;
>             p = p * mean / x;
>         }
>         return x;
>     }
> {code}
> The limit for the sampler is set by the ability to compute p0. This is 
> approximately 744.440 when Math.exp(-mean) returns 0.
> A conservative limit of 700 sets an initial probability p0 of 
> 9.85967654375977E-305. When run through the summation series for the limit (u 
> initialised to 1) the result when the summation ends (p is zero) leaves u = 
> 3.335439283623915E-15. This is close to the expected tolerance for floating 
> point error (Note: 1 - Math.nextDown(1) = 1.1102230246251565E-16).
> Using a mean of 10 leaves u = 4.988586742717954E-17. So smaller means have a 
> similar error. The error in the cumulative sum is expected to result in 
> truncation of the long tail of the Poisson distribution (which should be 
> bounded at infinity).
> This sampler should outperform the current {{SmallMeanPoissonSampler}} as it 
> requires 1 uniform deviate per sample.
> Note that the \{[SmallMeanPoissonSampler}} uses a limit for the mean of 
> Integer.MAX_VALUE / 2. This should be updated since it also relies on p0 
> being above zero.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to