[
https://issues.apache.org/jira/browse/MATH-1246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14746924#comment-14746924
]
Otmar Ertl commented on MATH-1246:
----------------------------------
The p-value is the probability that the observed KS-statistic is smaller than
the KS-statistic that I get if two random samples of same sizes are drawn from
the underlying distribution. In the no-ties case this value can be calculated
exactly without knowing the underlying distribution. In case of ties, the
p-value cannot be calculated exactly. There are different approaches how to
calculate some approximation of the p-value for the tie-case:
* Approximation of the underlying distribution by the observed data, which
definitely makes sense for bootstrapping where the sample sizes are usually
large. However, in our case the underlying distribution is estimated from small
sample sizes, since this is the domain for the exactP method. Therefore, I
doubt that the calculate p-value deserves the label "exact" in this case.
* Assumption that orderings of observed equal values are equally likely, which
of course is also an approximation.
I still do not understand why the first approach should be the true one.
> Kolmogorov-Smirnov 2-sample test does not correctly handle ties
> ---------------------------------------------------------------
>
> Key: MATH-1246
> URL: https://issues.apache.org/jira/browse/MATH-1246
> Project: Commons Math
> Issue Type: Bug
> Reporter: Phil Steitz
>
> For small samples, KolmogorovSmirnovTest(double[], double[]) computes the
> distribution of a D-statistic for m-n sets with no ties. No warning or
> special handling is delivered in the presence of ties.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)