[
https://issues.apache.org/jira/browse/MATH-984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13671779#comment-13671779
]
Phil Steitz commented on MATH-984:
----------------------------------
One more comment on getNextValue implementation. If we want to just do
straight inversion-based sampling, that is available for free from the
superclass, AbstractRealDistribution. To use that, we would just need to
reverse the roles of sample() and getNextValue - i.e., have getNextValue call
sample() and drop the override of sample() in EmpiricalDistribution.
> Incorrect (bugged) generating function getNextValue() in
> .random.EmpiricalDistribution
> --------------------------------------------------------------------------------------
>
> Key: MATH-984
> URL: https://issues.apache.org/jira/browse/MATH-984
> Project: Commons Math
> Issue Type: Bug
> Affects Versions: 3.2, 3.1.1
> Environment: all
> Reporter: Radoslav Tsvetkov
>
> The generating function getNextValue() in
> org.apache.commons.math3.random.EmpiricalDistribution
> will generate wrong values for all Distributions that are single tailed or
> limited. For example Data which are resembling Exponential or Lognormal
> distributions.
> The problem could be easily seen in code and tested.
> In last version code
> ...
> 490 return getKernel(stats).sample();
> ...
> it samples from Gaussian distribution to "smooth" in_the_bin. Obviously
> Gaussian Distribution is not limited and sometimes it does generates numbers
> outside the bin. In the case when it is the last bin it will generate wrong
> numbers.
> For example for empirical non-negative data it will generate negative rubbish.
> Additionally the proposed algorithm boldly returns only the mean value of
> the bin in case of one value! This last makes the generating function
> unusable for heavy tailed distributions with small number of values. (for
> example computer network traffic)
> On the last place usage of Gaussian soothing in the bin will change greatly
> some empirical distribution properties.
> The proposed method should be reworked to be applicable for real data which
> have often limited ranges. (either non-negative or both sides limited)
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira