[
https://issues.apache.org/jira/browse/MATH-1246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14790915#comment-14790915
]
Phil Steitz commented on MATH-1246:
-----------------------------------
I could be wrong on this and I am OK with reverting the current exactP ties
handling code and replacing with the random jitter approach. I still think the
exact p can in fact be computed with ties present; but to do so you have to
view the combined sample as the empirical distribution representing the
(combined) population. You make a good point above about that being dubious
for small samples. I will continue to research this, but given lack of
consensus, I will remove the implementation from the code.
So let's see if we can agree on
# Add non-naive exactP to handle no ties small sample. Extend it to n * m =
10000 as default behavior (this is the cut that R uses). Beyond this point,
use the K-S distribution, so we no longer need MonteCarloP for moderate size
samples.
# Implement jitter method and use this by default in the small sample case to
break ties. Until we have eliminated the need for MonteCarloP as a default,
use jitter to break ties for moderate sample sizes and use monteCarloP as is
post-jitter.
Optionally, implement a ks.boot-like monteCarloP that works with tied data.
> Kolmogorov-Smirnov 2-sample test does not correctly handle ties
> ---------------------------------------------------------------
>
> Key: MATH-1246
> URL: https://issues.apache.org/jira/browse/MATH-1246
> Project: Commons Math
> Issue Type: Bug
> Reporter: Phil Steitz
>
> For small samples, KolmogorovSmirnovTest(double[], double[]) computes the
> distribution of a D-statistic for m-n sets with no ties. No warning or
> special handling is delivered in the presence of ties.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)