[jira] [Resolved] (STATISTICS-69) Add an unconditioned exact test for 2x2 contingency tables

Alex Herbert (Jira) Wed, 05 Apr 2023 05:27:06 -0700


     [ 
https://issues.apache.org/jira/browse/STATISTICS-69?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Alex Herbert resolved STATISTICS-69.
------------------------------------
    Resolution: Implemented

Added in commit:

60b26ebca6a46f80651074daf895910051092c65

> Add an unconditioned exact test for 2x2 contingency tables
> ----------------------------------------------------------
>
>                 Key: STATISTICS-69
>                 URL: https://issues.apache.org/jira/browse/STATISTICS-69
>             Project: Commons Statistics
>          Issue Type: New Feature
>          Components: inference
>            Reporter: Alex Herbert
>            Priority: Minor
>             Fix For: 1.1
>
>
> A 2x2 contingency table [[a, b], [c, d]] is used to visualize N independent 
> observations of two binary variables (G or g and H or h):
>  
> {noformat}
>     G g
>   -------
> H | a b | m
> h | c d | n
>   ----------
>     s r | N{noformat}
> The probability distributions are classified into 3 cases:
>  # The row and column sums are fixed in advance. All table entries are 
> determined by a. This follows a hypergeometric distribution with parameters 
> N, m, s.
>  # The row sums are fixed, but the column sums are not. All table entries are 
> determined by a and c. The distribution is a joint binomial distribution with 
> probabilities p0 and p1:
> a ~ B(m, p0); c ~ B(n, p1)
>  # Only the total N is fixed (row and columns sums are not). The table (a, b, 
> c, d) is a multinomial distribution.
> Case 1 is covered by using Fisher's exact test (see [STATISTICS 64]). It does 
> not occur in practice very often as the column and row sums are both fixed in 
> advance. This is an exact conditioned test (as it conditions on the row sums).
> Case 2 is more common where the row sums are fixed but the columns are not. 
> For example a clinical trial with two groups of fixed size (e.g. medication 
> or placebo); the outcome of cure or no cure for each of the patients is 
> unknown.
> Case 3 is rare. For example flipping two coins N times and totalling the 
> heads/tails for each independently.
> I propose adding a test that can handle an unconditioned exact test. Case 2 
> is the more common and simpler to support. It involves generating a test 
> statistic for each possible table given the fixed totals. The p-value is 
> obtained from a subset of the possible test statistics that are more extreme 
> that the observed table. Alternatively the subset is maximised by 
> incrementally adding candidates based on which next sized subset has the 
> smallest p-value. This is the CSM (Convexity, Symmetry, Minimization) test of 
> Barnard (1945). This is computational expensive and benefits from precomputed 
> tables which ranks the order of tables for a given size (m,n). In either case 
> the computation of the p-value involves maximising the p-value given a 
> nuisance parameter in the range (0, 1).
> Possible test statistics are Fisher's p-value for the table (known as 
> Boschloo's test (1970)), or using a Z-pooled or Z-Unpooled statistic. 
> Implementation of the CSM test is computationally intense.
> There is a reference implementation in R as the Exact package:
> [https://cran.r-project.org/web/packages/Exact/Exact.pdf]
> SciPy has implementation of Boshloo's and the z-pooled/unpooled test (which 
> they name Barnard's test):
> [https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.boschloo_exact.html]
> [https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.barnard_exact.html]
> Note that the search for the nuisance parameter involves a univariate 
> function with multiple minima. The implementations in R and SciPy both use 
> multiple start points to find candidate locations for a search for a maxima. 
> This is done by using N uniform points in (0, 1) and then (optionally) 
> optimising the best candidate to find the maximum. The function requires 
> numerical differentiation and would be suitable for a non-derivative method 
> such as Brent optimisation for the univariate case.
> See also:
> [https://en.wikipedia.org/wiki/Boschloo%27s_test]
> [https://en.wikipedia.org/wiki/Barnard%27s_test]
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Resolved] (STATISTICS-69) Add an unconditioned exact test for 2x2 contingency tables

Reply via email to