[
https://issues.apache.org/jira/browse/MATH-1246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14737500#comment-14737500
]
Otmar Ertl edited comment on MATH-1246 at 9/9/15 9:19 PM:
----------------------------------------------------------
I am thinking of another way to treat ties:
The probability that two values sampled from a continuous distribution are
equal is equal to 0. One of them is always greater than the other. However,
represented as doubles we cannot distinguish them. Therefore, the best what we
can do is to treat both cases equally likely. For example, if we have x = (0,
3, 5) and y = (5, 6, 7) we get two different values for the observed
D-statistic. If we assume value 5 in x to be smaller than that in y, we would
get D=1. Otherwise, we would get D=2/3, both with probability 0.5. In the
general case, we can determine a discrete distribution describing all possible
values of the observed D-statistics. Finally, we calculate the p-value for each
of those possible values and calculate the weighted average which we take as
the final p-value.
Does this make sense? If yes, I think there is a way to adapt the new Monte
Carlo approach.
was (Author: otmar ertl):
I am thinking of another way to treat ties:
The probability that two values sampled from a continuous distribution are
equal is equal to 0. One of them is always greater than the other. However,
represented as doubles we cannot distinguish them. Therefore, the best what we
can do is to treat both cases equally likely. For example, if we have x = (0,
3, 5) and y = (5, 6, 7) we get two different values for the observed
D-statistic. If we assume value 5 in x to be smaller than that in y, we would
get D=3. Otherwise, we would get D=2, both with probability 0.5. In the general
case, we can determine a discrete distribution describing all possible values
of the observed D-statistics. Finally, we calculate the p-value for each of
those possible values and calculate the weighted average which we take as the
final p-value.
Does this make sense? If yes, I think there is a way to adapt the new Monte
Carlo approach.
> Kolmogorov-Smirnov 2-sample test does not correctly handle ties
> ---------------------------------------------------------------
>
> Key: MATH-1246
> URL: https://issues.apache.org/jira/browse/MATH-1246
> Project: Commons Math
> Issue Type: Bug
> Reporter: Phil Steitz
>
> For small samples, KolmogorovSmirnovTest(double[], double[]) computes the
> distribution of a D-statistic for m-n sets with no ties. No warning or
> special handling is delivered in the presence of ties.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)