[
https://issues.apache.org/jira/browse/MATH-984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13671377#comment-13671377
]
Radoslav Tsvetkov commented on MATH-984:
----------------------------------------
Simplest way would be to use inverse CDF (quantile funct) with some uniformly
distributed input. But unfortunately seems it's also having Kernel calculations
problems.
I'm getting NotStrictlyPositiveException from EmpiricalDistribution.getKernel
l(EmpiricalDistribution.java:846) )
I thought of looking in what algorithms R is using.
http://en.wikibooks.org/wiki/R_Programming/Random_Number_Generation
> Incorrect (bugged) generating function getNextValue() in
> .random.EmpiricalDistribution
> --------------------------------------------------------------------------------------
>
> Key: MATH-984
> URL: https://issues.apache.org/jira/browse/MATH-984
> Project: Commons Math
> Issue Type: Bug
> Affects Versions: 3.2, 3.1.1
> Environment: all
> Reporter: Radoslav Tsvetkov
>
> The generating function getNextValue() in
> org.apache.commons.math3.random.EmpiricalDistribution
> will generate wrong values for all Distributions that are single tailed or
> limited. For example Data which are resembling Exponential or Lognormal
> distributions.
> The problem could be easily seen in code and tested.
> In last version code
> ...
> 490 return getKernel(stats).sample();
> ...
> it samples from Gaussian distribution to "smooth" in_the_bin. Obviously
> Gaussian Distribution is not limited and sometimes it does generates numbers
> outside the bin. In the case when it is the last bin it will generate wrong
> numbers.
> For example for empirical non-negative data it will generate negative rubbish.
> Additionally the proposed algorithm boldly returns only the mean value of
> the bin in case of one value! This last makes the generating function
> unusable for heavy tailed distributions with small number of values. (for
> example computer network traffic)
> On the last place usage of Gaussian soothing in the bin will change greatly
> some empirical distribution properties.
> The proposed method should be reworked to be applicable for real data which
> have often limited ranges. (either non-negative or both sides limited)
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira