[ 
https://issues.apache.org/jira/browse/MATH-1246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14739536#comment-14739536
 ] 

Otmar Ertl commented on MATH-1246:
----------------------------------

The Monte Carlo approach can be modified by simultaneously sampling D. Here is 
an outline how this sampling  could be achieved:
# First determine set of points P = (p_i) for which equal values exist in both 
samples.
# Determine maximum difference of CDFs over all values not included in P
# Determine for each point p_i if it is possible at all to get a CDF difference 
that is larger than the calculated maximum. If not, those points can be 
excluded from P. Otherwise, remember the difference of the CDF d_i just before 
that point and the number of equal values in both samples n_i and m_i, 
respectively.
# Within each Monte Carlo iteration, generate for each point p_i a random 
ordering of the n_i and m_i equal values (using a function similar to 
fillBooleanArrayRandomlyWithFixedNumberTrueValues). Determine the maximum 
differences of the CDFs at all points p_i using the random ordering and d_i, 
and take the maximum of them and the maximum calculated in 2) which gives us 
the sampled (observed) D-statistic that is finally compared to curD.

Anyway, we should find the right definition first before implementing anything.

> Kolmogorov-Smirnov 2-sample test does not correctly handle ties
> ---------------------------------------------------------------
>
>                 Key: MATH-1246
>                 URL: https://issues.apache.org/jira/browse/MATH-1246
>             Project: Commons Math
>          Issue Type: Bug
>            Reporter: Phil Steitz
>
> For small samples, KolmogorovSmirnovTest(double[], double[]) computes the 
> distribution of a D-statistic for m-n sets with no ties.  No warning or 
> special handling is delivered in the presence of ties.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to