[ 
https://issues.apache.org/jira/browse/MATH-1246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14746924#comment-14746924
 ] 

Otmar Ertl commented on MATH-1246:
----------------------------------

The p-value is the probability that the observed KS-statistic is smaller than 
the KS-statistic that I get if two random samples of same sizes are drawn from 
the underlying distribution. In the no-ties case this value can be calculated 
exactly without knowing the underlying distribution. In case of ties, the 
p-value cannot be calculated exactly. There are different approaches how to 
calculate some approximation of the p-value for the tie-case:
* Approximation of the underlying distribution by the observed data, which 
definitely makes sense for bootstrapping where the sample sizes are usually 
large. However, in our case the underlying distribution is estimated from small 
sample sizes, since this is the domain for the exactP method. Therefore, I 
doubt that the calculate p-value deserves the label "exact" in this case.
* Assumption that orderings of observed equal values are equally likely, which 
of course is also an approximation.

I still do not understand why the first approach should be the true one.

> Kolmogorov-Smirnov 2-sample test does not correctly handle ties
> ---------------------------------------------------------------
>
>                 Key: MATH-1246
>                 URL: https://issues.apache.org/jira/browse/MATH-1246
>             Project: Commons Math
>          Issue Type: Bug
>            Reporter: Phil Steitz
>
> For small samples, KolmogorovSmirnovTest(double[], double[]) computes the 
> distribution of a D-statistic for m-n sets with no ties.  No warning or 
> special handling is delivered in the presence of ties.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to