[ 
https://issues.apache.org/jira/browse/MATH-1246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14737500#comment-14737500
 ] 

Otmar Ertl edited comment on MATH-1246 at 9/9/15 9:19 PM:
----------------------------------------------------------

I am thinking of another way to treat ties:

The probability that two values sampled from a continuous distribution are 
equal is equal to 0. One of them is always greater than the other. However, 
represented as doubles we cannot distinguish them. Therefore, the best what we 
can do is to treat both cases equally likely. For example, if we have x = (0, 
3, 5) and y = (5, 6, 7) we get two different values for the observed 
D-statistic. If we assume value 5 in x to be smaller than that in y, we would 
get D=1. Otherwise, we would get D=2/3, both with probability 0.5. In the 
general case, we can determine a discrete distribution describing all possible 
values of the observed D-statistics. Finally, we calculate the p-value for each 
of those possible values and calculate the weighted average which we take as 
the final p-value.

Does this make sense? If yes, I think there is a way to adapt the new Monte 
Carlo approach.


was (Author: otmar ertl):
I am thinking of another way to treat ties:

The probability that two values sampled from a continuous distribution are 
equal is equal to 0. One of them is always greater than the other. However, 
represented as doubles we cannot distinguish them. Therefore, the best what we 
can do is to treat both cases equally likely. For example, if we have x = (0, 
3, 5) and y = (5, 6, 7) we get two different values for the observed 
D-statistic. If we assume value 5 in x to be smaller than that in y, we would 
get D=3. Otherwise, we would get D=2, both with probability 0.5. In the general 
case, we can determine a discrete distribution describing all possible values 
of the observed D-statistics. Finally, we calculate the p-value for each of 
those possible values and calculate the weighted average which we take as the 
final p-value.

Does this make sense? If yes, I think there is a way to adapt the new Monte 
Carlo approach.

> Kolmogorov-Smirnov 2-sample test does not correctly handle ties
> ---------------------------------------------------------------
>
>                 Key: MATH-1246
>                 URL: https://issues.apache.org/jira/browse/MATH-1246
>             Project: Commons Math
>          Issue Type: Bug
>            Reporter: Phil Steitz
>
> For small samples, KolmogorovSmirnovTest(double[], double[]) computes the 
> distribution of a D-statistic for m-n sets with no ties.  No warning or 
> special handling is delivered in the presence of ties.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to