[ 
https://issues.apache.org/jira/browse/MATH-1627?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Herbert resolved MATH-1627.
--------------------------------
    Fix Version/s: 4.0
       Resolution: Fixed

Throw an exception if a column or row contains only zeros.

Updated in commit:

21f80081082ce3b31a1bcd8ecae0e3ae9ac70c05

 

> ChiSquareTest computes NaN with zero observations
> -------------------------------------------------
>
>                 Key: MATH-1627
>                 URL: https://issues.apache.org/jira/browse/MATH-1627
>             Project: Commons Math
>          Issue Type: Bug
>    Affects Versions: 4.0
>            Reporter: Alex Herbert
>            Priority: Trivial
>             Fix For: 4.0
>
>
> Zero observations input to the ChiSquareTest will compute NaN:
> {code:java}
> ChiSquareTest chi2Test = new ChiSquareTest();
> final long[][] counts = new long[2][2];
> // NaN
> double chi2 = chi2Test.chiSquare(counts);
> {code}
> This is due to a divide by zero error. This bug was identified by sonarcloud 
> analysis.
> The unit tests use R as a reference. In R this case will raise an error that 
> at least one entry must be positive. Setting a value to 1 allows R to compute 
> a Chi-square test value but the value is not valid:
> {code:r}
> > m <- array(c(1,0,0,0), dim = c(2,2))
> > chisq.test(m)
>       Pearson's Chi-squared test
> data:  m
> X-squared = NaN, df = 1, p-value = NA
> Warning message:
> In chisq.test(m) : Chi-squared approximation may be incorrect
> {code}
> Other methods in the ChiSquareTest will raise a ZeroException if the 
> observations are zero for an entire array of observations or if a pair of 
> observations in a bin are both zero.
> The Chi square test has assumptions that do not hold when the number of 
> observations are small. The limit for the number of observations per category 
> is variable. The document referenced in the code javadoc recommends an 
> expected level of 5 per bin. To avoid setting limits on the sample size a 
> suggested fix is to raise a zero exception if the sum of all counts is zero. 
> This will avoid a NaN computation. Use of a suitable number of observations 
> is left to the caller.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to