[ 
https://issues.apache.org/jira/browse/MATH-1179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14604853#comment-14604853
 ] 

Phil Steitz commented on MATH-1179:
-----------------------------------

Sorry, Thomas, I was not clear about the Monte Carlo improvement.  It is 
uniformly an improvement - a much more efficient way to do the simulation.  The 
problem is that the results are not consistent for a lot of problem instances 
without a huge number of iterations.  That is a limitation of the algorithm, 
not the implementation.  I will see if I can dig up some old examples or 
generate new ones.  You can see it by doing multiple runs with a moderate 
number of iterations and comparing results.  Not too stable for unfortunately 
exactly the class of problem instances it is being used for.

One question for you, Thomas.  I got stuck when implementing the 2-sample test 
because I could not convince myself that the analysis in the Simard-Ecuyer 
paper applied to the 2-sample case.  I would have proceeded as you are 
recommending if I could have convinced myself of that (and found a way to 
reduce n,m to n).  I tried to do it for the one-sample case, but could not 
convince myself that the 2-sample case could be viewed the same way.  I am 
probably missing something very simple here.  Can you explain how exactly the 
results apply?  Sorry I am being a little dense here.  I would be happy if I 
could convince myself that at least for the case n = m, the analysis applies 
directly or somehow we can use some function of m,n.

Regarding the failing test, it may not actually be a problem.  What the test is 
doing is seeing if a 2-sample KS statistic equal to a critical value shown in 
the reference table in [1] gives the expected p-value.  The table only provides 
2 digits of accuracy in the critical values and the test may be failing falsely 
due to that.  How bad are the failures (how far off expected values)?

I am sorry I have not been able to really figure out what is going on in R or 
the other references you mention.  I have not been able to find a good 
reference covering the 2-sample case in detail.  I will keep looking.  
References welcome!

[1] https://en.wikipedia.org/wiki/Kolmogorov%E2%80%93Smirnov_test
(section on 2-sample test)

> kolmogorovSmirnovTest poor performance in monteCarloP method
> ------------------------------------------------------------
>
>                 Key: MATH-1179
>                 URL: https://issues.apache.org/jira/browse/MATH-1179
>             Project: Commons Math
>          Issue Type: Bug
>            Reporter: Gilad
>             Fix For: 4.0
>
>         Attachments: KSTest-JavaAndR.txt, KSTestSnippet.txt
>
>
> I'm using the kolmogovSmirnovTest method to calculate pvalues.
> However, when i try running the test on two double[] of sizes 5 and 45 the 
> results take over 10 seconds to calculate.
> This seems very long, whereas in R it takes a few miliseconds for the same 
> calculation.
> I'd be very happy to hear any comment you may have on the subject.
>    Gilad



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to