[
https://issues.apache.org/jira/browse/MATH-1246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14745495#comment-14745495
]
Otmar Ertl edited comment on MATH-1246 at 9/15/15 2:00 PM:
-----------------------------------------------------------
After some research I have the feeling we are discussing how to define zero
divided by zero. There are at least two methods to calculate a reasonable
p-value in the presence of ties:
# The method you have proposed which seems to be also known as permutation
method. Averaging only over some permutations and averaging over all possible
permutations correspond to the bootstrap method and the current exactP()
implementation, respectively.
# Another method is to add some jitter to the sampled values to break ties.
(This google search
https://www.google.com/?gfe_rd=cr&ei=qCL4VaKvNIWI8QfLibD4Bg&gws_rd=cr&fg=1#q=jitter+kolmogorov+smirnov
immediately gives you a couple of references.) This method corresponds to the
method I have proposed. Adding small random values to ties to get a strict
ordering corresponds to choosing any random ordering. Averaging over all
possible orderings would also lead to a well-defined p-value.
Maybe, the user should be able to choose the method how to resolve ties?
was (Author: otmar ertl):
After some research I have the feeling we are discussing how to define zero
divided by zero. There are at least two methods to calculate a reasonable
p-value in the presence of ties:
# The method you have proposed which seems to be also known as permutation
method. Averaging only over some permutations and averaging over all possible
permutations correspond to the bootstrap method and the current exactP()
implementation, respectively.
# Another method is to add some jitter to the sampled values to break ties.
(This google search
https://www.google.com/?gfe_rd=cr&ei=qCL4VaKvNIWI8QfLibD4Bg&gws_rd=cr&fg=1#q=jitter+kolmogorov+smirnov
immediately gives you a couple of references.) This method corresponds to the
method I have proposed. Adding small random values to ties to get a strict
ordering corresponds to choosing any random ordering. Averaging over all
possible orderings would also lead to a well-defined p-value.
Maybe, the user should be able to choose the method how to resolve ties?
> Kolmogorov-Smirnov 2-sample test does not correctly handle ties
> ---------------------------------------------------------------
>
> Key: MATH-1246
> URL: https://issues.apache.org/jira/browse/MATH-1246
> Project: Commons Math
> Issue Type: Bug
> Reporter: Phil Steitz
>
> For small samples, KolmogorovSmirnovTest(double[], double[]) computes the
> distribution of a D-statistic for m-n sets with no ties. No warning or
> special handling is delivered in the presence of ties.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)