[ 
https://issues.apache.org/jira/browse/RNG-52?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16625626#comment-16625626
 ] 

Alex D Herbert commented on RNG-52:
-----------------------------------

In short, I agree.

Here're my thoughts:

The fix RNG-51 added a check and Javadoc that a mean above 
{{Integer.Max_VALUE}} is invalid. However the {{SmallMeanPoissonSampler}} has 
no check and the limit of {{Integer.MAX_VALUE}} does cause truncation.

It should be noted that the convolution of two Poisson distributions of means 
{{a}} and {{b}} is a Poisson distribution of mean {{a+b}}, see:

[Convolution of Probability 
Distributions|https://en.wikipedia.org/wiki/List_of_convolutions_of_probability_distributions]

This is how the {{LargeMeanPoissonSampler}} is able to cache pre-computed 
state. It caches state for the mean converted to an integer and when sampling 
adds a sample from a Poisson distribution using the remaining mean fraction.

So in theory any large limit could be chosen and specified in the Javadoc. It 
could then be stated that to obtain a Poisson sample for a mean larger than the 
limit requires obtaining samples from distributions with means below the limit 
using two or more means that sum to the desired mean and summing the resulting 
sample values.

The standard deviation for a Poisson distribution with mean {{0.5 * 
Integer.MAX_VALUE}} would be {{sqrt(0.5 * Integer.MAX_VALUE) = 32768}}. 
Assuming a Poisson distribution is approximated by a Gaussian at large mean 
then the upper range of the sample ({{Integer.MAX_VALUE}}) is over 30000 SD 
units above the mean. The likelihood of this is probably beyond the limit that 
can be computed using {{double}} precision. So {{0.5 * Integer.MAX_VALUE}} is a 
very conservative upper limit. However if a user of the library hits this level 
and gets an exception it should give them pause for thought about what they are 
trying to achieve in their use case.

All this could be added to the Javadoc.

 

> PoissonSampler allows mean above Integer.MAX_VALUE
> --------------------------------------------------
>
>                 Key: RNG-52
>                 URL: https://issues.apache.org/jira/browse/RNG-52
>             Project: Commons RNG
>          Issue Type: Bug
>          Components: sampling
>    Affects Versions: 1.1
>            Reporter: Alex D Herbert
>            Priority: Major
>             Fix For: 1.2
>
>
> The {{PoissonSampler}} is limited to returning an integer by the interface of 
> the {{DiscreteSampler}}. As it stands an input mean above 
> {{Integer.MAX_VALUE}} is allowed although it makes no sense as the Poisson 
> distribution is significantly truncated.
> The algorithm of the {{SmallMeanPoissonSampler}} sets a limit on the returned 
> sample of {{Integer.MAX_VALUE}}. The algorithm is valid although run-time 
> would be impractical due to the nature of the algorithm. However at high mean 
> (>40) the end user is expected to use either the {{LargeMeanPoissonSampler}} 
> directly or the {{PoissonSampler}} which chooses the appropriate large mean 
> algorithm.
> However the current {{LargeMeanPoissonSampler}} uses 
> {{(int)Math.floor(mean)}} during initialisation and any mean above 
> {{Integer.MAX_VALUE}} would therefore be unsupported.
> I propose to add this to the constructor of each Poisson sampler:
> {code:java}
> if (mean > Integer.MAX_VALUE) {
>     throw new IllegalArgumentException(mean + " > " + Integer.MAX_VALUE);
> }
> {code}
> with documentation
> {code:java}
>  * @throws IllegalArgumentException if {@code mean <= 0} or {@code mean > 
> }{@link Integer.MAX_VALUE}.
> {code}
> It is noted that the limit of {{Integer.MAX_VALUE}} would allow the samples 
> to reflect the Poisson distribution below that level but truncate it above 
> that level to represent the remaining cumulative histogram at the single 
> point of {{Integer.MAX_VALUE}}. This maintains the functionality of the 
> sampler within the contract of the integer value returned by 
> {{DiscreteSampler}}.
> In practice the Poisson distribution is unlikely to be used at such a high 
> mean; in this case it is appropriate to use a Gaussian approximation to the 
> Poisson.
> Note: Currently there is no code coverage from tests for the 
> \{{LargeMeanPoissonSampler}} checking if the mean is <= 0. Tests should be 
> added to check the constructor does throw when a bad mean is used.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to