[
https://issues.apache.org/jira/browse/MATH-1381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15453154#comment-15453154
]
Kexin Xie edited comment on MATH-1381 at 8/31/16 7:35 PM:
----------------------------------------------------------
Hi [~erans], thanks for looking at the PR. I agree with you that this does
seems like it's a dirty fix and mask a potential bug in the computation.
However, the main problem here is that there is one corner case that the
current algorithm did not consider. Which is that if the probability is large
enough and the success is the same as the number of trials and both numbers are
small enough, it will cause the {{criticalValueLow}} to rise too quickly and be
the same as {{criticalValueHigh}}. The if condition in L138 is suppose to check
the symmetry case when {{pLow == pHigh}}, but is not for the case when
{{criticalValueLow == criticalValueHigh}}. At that point the probability will
always jump to above 1.
It may seem like a dirty fix, but I have checked against results in R, and
Python's scipy equivalent, and they produce the same value. I implemented this
way because it actually works in handling this boundary condition, and it's the
least change to the original implementation. Note that Python's scipy also uses
a similar approach to deal with estimated value rising above 1
https://github.com/scipy/scipy/blob/v0.14.0/scipy/stats/morestats.py#L1661
I've also updated with more exhaustive test cases, please have a look again.
Also I think the current implementation is correct as explained above, but I'm
happy to change the estimation algorithm if that's required.
was (Author: kexinxie):
Hi [~erans], thanks for looking at the PR. I agree with you that this does
seems like it's a dirty fix and mask a potential bug in the computation.
However, the main problem here is that there is one corner case that the
current algorithm did not consider. Which is that if the probability is large
enough and the success is the same as the number of trials and both numbers are
small enough, it will cause the {{criticalValueLow}} to rise too quickly and be
the same as {{criticalValueHigh}}. The if condition in L138 is suppose to check
the symmetry case when {{pLow == pHigh}}, but is not for the case when
{{criticalValueLow == criticalValueHigh}}. At that point the probability will
always jump to above 1.
It may seem like a dirty fix, but I have checked against results in R, and
Python's scipy equivalent, and they produce the same value. I implemented this
way because it actually works in handling this boundary condition, and it's the
least change to the original implementation. Note that Python's scipy also uses
a similar approach to deal with estimated value rising above 1
https://github.com/scipy/scipy/blob/v0.14.0/scipy/stats/morestats.py#L1661
I've also updated with more exhaustive test cases, please have a look again.
> BinomialTest P-value > 1
> ------------------------
>
> Key: MATH-1381
> URL: https://issues.apache.org/jira/browse/MATH-1381
> Project: Commons Math
> Issue Type: Bug
> Reporter: Wang Qiang
>
> When I use the Binomial Test, I got p-value > 1 for two sided check.
> Example:
> (new BinomialTest()).binomialTest(200, 200, 0.9950429,
> AlternativeHypothesis.TWO_SIDED) == 1.3701357550780435
> In my case, if the expected p-value is 1 (calculated by package in other
> language, scipy in this case), the p-value returned could be > 1
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)