[
https://issues.apache.org/jira/browse/MATH-1179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14604853#comment-14604853
]
Phil Steitz commented on MATH-1179:
-----------------------------------
Sorry, Thomas, I was not clear about the Monte Carlo improvement. It is
uniformly an improvement - a much more efficient way to do the simulation. The
problem is that the results are not consistent for a lot of problem instances
without a huge number of iterations. That is a limitation of the algorithm,
not the implementation. I will see if I can dig up some old examples or
generate new ones. You can see it by doing multiple runs with a moderate
number of iterations and comparing results. Not too stable for unfortunately
exactly the class of problem instances it is being used for.
One question for you, Thomas. I got stuck when implementing the 2-sample test
because I could not convince myself that the analysis in the Simard-Ecuyer
paper applied to the 2-sample case. I would have proceeded as you are
recommending if I could have convinced myself of that (and found a way to
reduce n,m to n). I tried to do it for the one-sample case, but could not
convince myself that the 2-sample case could be viewed the same way. I am
probably missing something very simple here. Can you explain how exactly the
results apply? Sorry I am being a little dense here. I would be happy if I
could convince myself that at least for the case n = m, the analysis applies
directly or somehow we can use some function of m,n.
Regarding the failing test, it may not actually be a problem. What the test is
doing is seeing if a 2-sample KS statistic equal to a critical value shown in
the reference table in [1] gives the expected p-value. The table only provides
2 digits of accuracy in the critical values and the test may be failing falsely
due to that. How bad are the failures (how far off expected values)?
I am sorry I have not been able to really figure out what is going on in R or
the other references you mention. I have not been able to find a good
reference covering the 2-sample case in detail. I will keep looking.
References welcome!
[1] https://en.wikipedia.org/wiki/Kolmogorov%E2%80%93Smirnov_test
(section on 2-sample test)
> kolmogorovSmirnovTest poor performance in monteCarloP method
> ------------------------------------------------------------
>
> Key: MATH-1179
> URL: https://issues.apache.org/jira/browse/MATH-1179
> Project: Commons Math
> Issue Type: Bug
> Reporter: Gilad
> Fix For: 4.0
>
> Attachments: KSTest-JavaAndR.txt, KSTestSnippet.txt
>
>
> I'm using the kolmogovSmirnovTest method to calculate pvalues.
> However, when i try running the test on two double[] of sizes 5 and 45 the
> results take over 10 seconds to calculate.
> This seems very long, whereas in R it takes a few miliseconds for the same
> calculation.
> I'd be very happy to hear any comment you may have on the subject.
> Gilad
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)