[
https://issues.apache.org/jira/browse/MATH-1246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14737874#comment-14737874
]
Phil Steitz commented on MATH-1246:
-----------------------------------
Otmar - that is a cool idea. So for a given sample with ties, the D statistic
would be the average of the D's you could make by differently ordering the tied
values, right? That is an alternative to what I did. I wonder how different
the p-values would end up. I think they will definitely be different, but I
wonder by how much. Note also things get a little complicated when you have
more than two of the same value, which we have to assume could happen.
How would you modify the current Monte Carlo approach to do this?
I tried to find references to how others handle this; but unfortunately most
packages just say you can't compute exact p values in the presence of ties.
Given that we have two, defensible and different definitions of how the p-value
should be defined, I can see why. I will look some more to see if I can find
some math stat references to help us settle on the right definition.
Thanks for looking into this.
> Kolmogorov-Smirnov 2-sample test does not correctly handle ties
> ---------------------------------------------------------------
>
> Key: MATH-1246
> URL: https://issues.apache.org/jira/browse/MATH-1246
> Project: Commons Math
> Issue Type: Bug
> Reporter: Phil Steitz
>
> For small samples, KolmogorovSmirnovTest(double[], double[]) computes the
> distribution of a D-statistic for m-n sets with no ties. No warning or
> special handling is delivered in the presence of ties.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)