Anders Conbere created MATH-1140:
------------------------------------

             Summary: Incorrect result from MannWhitneyUTest#mannWhitneyUTest 
with large datasets
                 Key: MATH-1140
                 URL: https://issues.apache.org/jira/browse/MATH-1140
             Project: Commons Math
          Issue Type: Bug
    Affects Versions: 3.3
            Reporter: Anders Conbere
            Priority: Minor


On large datasets MannWhitneyUTest#mannWhitneyUTest returns the double value 
0.0 instead of the correct p-value. I suspect this is an overflow but haven't 
been able to trace it down yet.

I'm afraid I'm not very good at java, but I'm including a link to a public 
repository where you can reproduce the issue, unfortunately my implementation 
is written in clojure.

https://github.com/aconbere/apache-commons-mann-whitney-bug

The summary is that by calling MannWhitneyUTest#mannWhitneyUTest with two 
randomly generated arrays (50k elements with a max value of 300) I can reliably 
reproduce the result 0.0. By reducing that to something more modest  like 2k I 
get correct p-value calculations.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to