[
https://issues.apache.org/jira/browse/MATH-1179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14526424#comment-14526424
]
Thomas Neidhart commented on MATH-1179:
---------------------------------------
Basically all implementation that I checked do the following:
{code}
public double approximateP(double d, int n, int m) {
final double dm = m;
final double dn = n;
final double en = FastMath.sqrt(dm * dn / (dm + dn));
// this is added
final double en2 = en + 0.12 + 0.11/en;
return 1 - ksSum(d * en2, KS_SUM_CAUCHY_CRITERION,
MAXIMUM_PARTIAL_SUM_COUNT);
}
{code}
I could not find an explanation for this, but it is probably in one of the
referenced papers (see link below).
In this Matlab file, there is also an estimation when the asymptotic P-value
approximation is considered to be reasonably accurate:
{code}
(n*m) / (n + m) > 4
{code}
Link: https://github.com/ICEACE/MATLAB/blob/master/kstest2.m
The same is also done in scipy and most likely also in R, but I did not check
yet.
Using this correction, we get nearly the same result as in R: 0.2198891183722148
> kolmogorovSmirnovTest poor performance in monteCarloP method
> ------------------------------------------------------------
>
> Key: MATH-1179
> URL: https://issues.apache.org/jira/browse/MATH-1179
> Project: Commons Math
> Issue Type: Bug
> Reporter: Gilad
> Fix For: 4.0
>
> Attachments: KSTest-JavaAndR.txt, KSTestSnippet.txt
>
>
> I'm using the kolmogovSmirnovTest method to calculate pvalues.
> However, when i try running the test on two double[] of sizes 5 and 45 the
> results take over 10 seconds to calculate.
> This seems very long, whereas in R it takes a few miliseconds for the same
> calculation.
> I'd be very happy to hear any comment you may have on the subject.
> Gilad
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)