[ 
https://issues.apache.org/jira/browse/RNG-146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17368969#comment-17368969
 ] 

Alex Herbert commented on RNG-146:
----------------------------------

OK. So here are some options:
 # Attempt to calculate the bounds of the cumulative distribution with 
quantiles close to 0 and 1. If these are infinite then the distribution is 
clipped and throw an exception.
 # Do nothing and leave it to the user to figure out
 # Validate the standard deviation is finite just so the sampler does not 
return NaN.

I think a precedent is set in the LargeMeanPoissonSampler to avoid truncated 
distributions. This is bounded to return an int value. When the sampler has a 
very large mean the Poisson distribution may be truncated so a limit is set at 
half of the maximum integer value. Above this then the constructor throws.

With a mean of 2^30 the CDF(x=2^31) for a Poisson is 1 (according to Matlab). 
At this point the std dev is roughly sqrt(mean) so 32768. The limit of the 
return value is thus 65536 standard deviations above the mean. This is quite 
conservative. However at this point you may as well use a Gaussian to create 
samples and just clip the lower bound to 0. It is not likely to affect an end 
user.

Should a similar trivial limit be set for a scaled Gaussian sampler? The bounds 
set by a finite double should be a very large number of standard deviations 
away from the mean:
{code:java}
public GaussianSampler(NormalizedGaussianSampler normalized,
                       double mean,
                       double standardDeviation) {
    if (standardDeviation <= 0) {
        throw new IllegalArgumentException(
            "standard deviation is not strictly positive: " + 
standardDeviation);
    }

    // Check bounds
    if (mean - 10 * standardDeviation == Double.NEGATIVE_INFINITY ||
        mean + 10 * standardDeviation == Double.POSITIVE_INFINITY ) {
        throw new IllegalArgumentException("Possible truncation ...");
    }

    this.normalized = normalized;
    this.mean = mean;
    this.standardDeviation = standardDeviation;
}
{code}
Here I used 10 times the Std Deviation. The matlab distribution can only 
compute up to 9 standard deviations:
{noformat}
>> pd = makedist('Norm');
>> format long;
>> pd.cdf([5:1:10])'

ans =

   0.999999713348428
   0.999999999013412
   0.999999999998720
   0.999999999999999
   1.000000000000000
   1.000000000000000
{noformat}


> GaussianSampler should not allow infinite standard deviation
> ------------------------------------------------------------
>
>                 Key: RNG-146
>                 URL: https://issues.apache.org/jira/browse/RNG-146
>             Project: Commons RNG
>          Issue Type: Bug
>          Components: sampling
>    Affects Versions: 1.3
>            Reporter: Alex Herbert
>            Priority: Trivial
>
> The GaussianSampler requires the standard deviation is strictly positive. It 
> allows an infinite value. This will produce a NaN output if the 
> NormalizedGaussianSampler returns 0:
> {code:java}
> @Test
> public void testInfiniteStdDev() {
>     NormalizedGaussianSampler gauss = new NormalizedGaussianSampler() {
>         @Override
>         public double sample() {
>             return 0;
>         }
>     };
>     GaussianSampler s = new GaussianSampler(gauss, 0, 
> Double.POSITIVE_INFINITY);
>     Assert.assertEquals(Double.NaN, s.sample(), 0.0);
> }
> {code}
> A fix is to require the standard deviation is finite.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to