[
https://issues.apache.org/jira/browse/RNG-146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17368969#comment-17368969
]
Alex Herbert commented on RNG-146:
----------------------------------
OK. So here are some options:
# Attempt to calculate the bounds of the cumulative distribution with
quantiles close to 0 and 1. If these are infinite then the distribution is
clipped and throw an exception.
# Do nothing and leave it to the user to figure out
# Validate the standard deviation is finite just so the sampler does not
return NaN.
I think a precedent is set in the LargeMeanPoissonSampler to avoid truncated
distributions. This is bounded to return an int value. When the sampler has a
very large mean the Poisson distribution may be truncated so a limit is set at
half of the maximum integer value. Above this then the constructor throws.
With a mean of 2^30 the CDF(x=2^31) for a Poisson is 1 (according to Matlab).
At this point the std dev is roughly sqrt(mean) so 32768. The limit of the
return value is thus 65536 standard deviations above the mean. This is quite
conservative. However at this point you may as well use a Gaussian to create
samples and just clip the lower bound to 0. It is not likely to affect an end
user.
Should a similar trivial limit be set for a scaled Gaussian sampler? The bounds
set by a finite double should be a very large number of standard deviations
away from the mean:
{code:java}
public GaussianSampler(NormalizedGaussianSampler normalized,
double mean,
double standardDeviation) {
if (standardDeviation <= 0) {
throw new IllegalArgumentException(
"standard deviation is not strictly positive: " +
standardDeviation);
}
// Check bounds
if (mean - 10 * standardDeviation == Double.NEGATIVE_INFINITY ||
mean + 10 * standardDeviation == Double.POSITIVE_INFINITY ) {
throw new IllegalArgumentException("Possible truncation ...");
}
this.normalized = normalized;
this.mean = mean;
this.standardDeviation = standardDeviation;
}
{code}
Here I used 10 times the Std Deviation. The matlab distribution can only
compute up to 9 standard deviations:
{noformat}
>> pd = makedist('Norm');
>> format long;
>> pd.cdf([5:1:10])'
ans =
0.999999713348428
0.999999999013412
0.999999999998720
0.999999999999999
1.000000000000000
1.000000000000000
{noformat}
> GaussianSampler should not allow infinite standard deviation
> ------------------------------------------------------------
>
> Key: RNG-146
> URL: https://issues.apache.org/jira/browse/RNG-146
> Project: Commons RNG
> Issue Type: Bug
> Components: sampling
> Affects Versions: 1.3
> Reporter: Alex Herbert
> Priority: Trivial
>
> The GaussianSampler requires the standard deviation is strictly positive. It
> allows an infinite value. This will produce a NaN output if the
> NormalizedGaussianSampler returns 0:
> {code:java}
> @Test
> public void testInfiniteStdDev() {
> NormalizedGaussianSampler gauss = new NormalizedGaussianSampler() {
> @Override
> public double sample() {
> return 0;
> }
> };
> GaussianSampler s = new GaussianSampler(gauss, 0,
> Double.POSITIVE_INFINITY);
> Assert.assertEquals(Double.NaN, s.sample(), 0.0);
> }
> {code}
> A fix is to require the standard deviation is finite.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)