[ 
https://issues.apache.org/jira/browse/MATH-1246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14737874#comment-14737874
 ] 

Phil Steitz commented on MATH-1246:
-----------------------------------

Otmar - that is a cool idea.  So for a given sample with ties, the D statistic 
would be the average of the D's you could make by differently ordering the tied 
values, right?  That is an alternative to what I did.  I wonder how different 
the p-values would end up.   I think they will definitely be different, but I 
wonder by how much.  Note also things get a little complicated when you have 
more than two of the same value, which we have to assume could happen.

How would you modify the current Monte Carlo approach to do this?

I tried to find references to how others handle this; but unfortunately most 
packages just say you can't compute exact p values in the presence of ties.  
Given that we have two, defensible and different definitions of how the p-value 
should be defined, I can see why.  I will look some more to see if I can find 
some math stat references to help us settle on the right definition.

Thanks for looking into this.

> Kolmogorov-Smirnov 2-sample test does not correctly handle ties
> ---------------------------------------------------------------
>
>                 Key: MATH-1246
>                 URL: https://issues.apache.org/jira/browse/MATH-1246
>             Project: Commons Math
>          Issue Type: Bug
>            Reporter: Phil Steitz
>
> For small samples, KolmogorovSmirnovTest(double[], double[]) computes the 
> distribution of a D-statistic for m-n sets with no ties.  No warning or 
> special handling is delivered in the presence of ties.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to