Radoslav Tsvetkov created MATH-984:
--------------------------------------

             Summary: Incorrect (bugged) generating function getNextValue() in 
.random.EmpiricalDistribution
                 Key: MATH-984
                 URL: https://issues.apache.org/jira/browse/MATH-984
             Project: Commons Math
          Issue Type: Bug
    Affects Versions: 3.1.1
         Environment: all
            Reporter: Radoslav Tsvetkov


The generating function getNextValue() in 
org.apache.commons.math3.random.EmpiricalDistribution
will generate wrong values for all Distributions that are single tailed or 
limited. For example Data which are resembling Exponential or Lognormal 
distributions.

The problem could be easily seen in code and tested.

In last version code
...
490               return getKernel(stats).sample();
...
it samples from Gaussian distribution to "smooth" in_the_bin. Obviously 
Gaussian Distribution is not limited and sometimes it does generates numbers 
outside the bin. In the case when it is the last bin it will generate wrong 
numbers. 

For example for empirical non-negative data it will generate negative rubbish.

  Additionally the proposed algorithm boldly returns only the mean value of the 
bin in case of one value! This last makes the generating function unusable for 
heavy tailed distributions with small number of values. (for example computer 
network traffic)

On the last place usage of Gaussian soothing in the bin will change greatly 
some empirical distribution properties.

The proposed method should be reworked to be applicable for real data which 
have often limited ranges. (either non-negative or both sides limited)




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to