Phil Steitz created MATH-1310:
---------------------------------

             Summary: Improve accuracy and performance of 2-sample 
Kolmogorov-Smirnov test
                 Key: MATH-1310
                 URL: https://issues.apache.org/jira/browse/MATH-1310
             Project: Commons Math
          Issue Type: Bug
    Affects Versions: 3.5
            Reporter: Phil Steitz
             Fix For: 3.6


As of 3.5, the exactP method used to compute exact  p-values for 2-sample 
Kolmogorov-Smirnov tests is very slow, as it is based on a naive implementation 
that enumarates all n-m partitions of the combined sample.  As a result, its 
use is not recommended for problems where the product of the two sample sizes 
exceeds 100 and the kolmogorovSmirnovTest method uses it only for samples in 
this range.  To handle sample size products between 100 and 10000, where the 
asymptotic KS distribution can be used, this method currently uses Monte Carlo 
simulation.  Convergence is poor for many problem instances, resulting in 
inaccurate results.

To eliminate the need for the Monte Carlo simulation and increase the 
performance of exactP itself, a faster exactP implementation should be added.  
This can be implemented by unwinding the recursive functions defined in Chapter 
5, table 5.2 in:

Wilcox, Rand. 2012. Introduction to Robust Estimation and Hypothesis Testing, 
Chapter 5, 3rd Ed. Academic Press.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to