[ 
https://issues.apache.org/jira/browse/MATH-1246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14745495#comment-14745495
 ] 

Otmar Ertl edited comment on MATH-1246 at 9/15/15 2:00 PM:
-----------------------------------------------------------

After some research I have the feeling we are discussing how to define zero 
divided by zero. There are at least two methods to calculate a reasonable 
p-value in the presence of ties:
# The method you have proposed which seems to be also known as permutation 
method. Averaging only over some permutations and averaging over all possible 
permutations correspond to the bootstrap method and the current exactP() 
implementation, respectively.
# Another method is to add some jitter to the sampled values to break ties. 
(This google search 
https://www.google.com/?gfe_rd=cr&ei=qCL4VaKvNIWI8QfLibD4Bg&gws_rd=cr&fg=1#q=jitter+kolmogorov+smirnov
 immediately gives you a couple of references.) This method corresponds to the 
method I have proposed. Adding small random values to ties to get a strict 
ordering corresponds to choosing any random ordering. Averaging over all 
possible orderings would also lead to a well-defined p-value.

Maybe, the user should be able to choose the method how to resolve ties?



was (Author: otmar ertl):
After some research I have the feeling we are discussing how to define zero 
divided by zero. There are at least two methods to calculate a reasonable 
p-value in the presence of ties:
# The method you have proposed which seems to be also known as permutation 
method. Averaging only over some permutations and averaging over all possible 
permutations correspond to the bootstrap method and the current exactP() 
implementation, respectively.
# Another method is to add some jitter to the sampled values to break ties. 
(This google search 
https://www.google.com/?gfe_rd=cr&ei=qCL4VaKvNIWI8QfLibD4Bg&gws_rd=cr&fg=1#q=jitter+kolmogorov+smirnov
 immediately gives you a couple of references.) This method corresponds to the 
method I have proposed. Adding small random values to ties to get a strict 
ordering corresponds to choosing any random ordering. Averaging over all 
possible orderings would also lead to a well-defined p-value.
Maybe, the user should be able to choose the method how to resolve ties?


> Kolmogorov-Smirnov 2-sample test does not correctly handle ties
> ---------------------------------------------------------------
>
>                 Key: MATH-1246
>                 URL: https://issues.apache.org/jira/browse/MATH-1246
>             Project: Commons Math
>          Issue Type: Bug
>            Reporter: Phil Steitz
>
> For small samples, KolmogorovSmirnovTest(double[], double[]) computes the 
> distribution of a D-statistic for m-n sets with no ties.  No warning or 
> special handling is delivered in the presence of ties.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to