[
https://issues.apache.org/jira/browse/STATISTICS-69?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Alex Herbert resolved STATISTICS-69.
------------------------------------
Resolution: Implemented
Added in commit:
60b26ebca6a46f80651074daf895910051092c65
> Add an unconditioned exact test for 2x2 contingency tables
> ----------------------------------------------------------
>
> Key: STATISTICS-69
> URL: https://issues.apache.org/jira/browse/STATISTICS-69
> Project: Commons Statistics
> Issue Type: New Feature
> Components: inference
> Reporter: Alex Herbert
> Priority: Minor
> Fix For: 1.1
>
>
> A 2x2 contingency table [[a, b], [c, d]] is used to visualize N independent
> observations of two binary variables (G or g and H or h):
>
> {noformat}
> G g
> -------
> H | a b | m
> h | c d | n
> ----------
> s r | N{noformat}
> The probability distributions are classified into 3 cases:
> # The row and column sums are fixed in advance. All table entries are
> determined by a. This follows a hypergeometric distribution with parameters
> N, m, s.
> # The row sums are fixed, but the column sums are not. All table entries are
> determined by a and c. The distribution is a joint binomial distribution with
> probabilities p0 and p1:
> a ~ B(m, p0); c ~ B(n, p1)
> # Only the total N is fixed (row and columns sums are not). The table (a, b,
> c, d) is a multinomial distribution.
> Case 1 is covered by using Fisher's exact test (see [STATISTICS 64]). It does
> not occur in practice very often as the column and row sums are both fixed in
> advance. This is an exact conditioned test (as it conditions on the row sums).
> Case 2 is more common where the row sums are fixed but the columns are not.
> For example a clinical trial with two groups of fixed size (e.g. medication
> or placebo); the outcome of cure or no cure for each of the patients is
> unknown.
> Case 3 is rare. For example flipping two coins N times and totalling the
> heads/tails for each independently.
> I propose adding a test that can handle an unconditioned exact test. Case 2
> is the more common and simpler to support. It involves generating a test
> statistic for each possible table given the fixed totals. The p-value is
> obtained from a subset of the possible test statistics that are more extreme
> that the observed table. Alternatively the subset is maximised by
> incrementally adding candidates based on which next sized subset has the
> smallest p-value. This is the CSM (Convexity, Symmetry, Minimization) test of
> Barnard (1945). This is computational expensive and benefits from precomputed
> tables which ranks the order of tables for a given size (m,n). In either case
> the computation of the p-value involves maximising the p-value given a
> nuisance parameter in the range (0, 1).
> Possible test statistics are Fisher's p-value for the table (known as
> Boschloo's test (1970)), or using a Z-pooled or Z-Unpooled statistic.
> Implementation of the CSM test is computationally intense.
> There is a reference implementation in R as the Exact package:
> [https://cran.r-project.org/web/packages/Exact/Exact.pdf]
> SciPy has implementation of Boshloo's and the z-pooled/unpooled test (which
> they name Barnard's test):
> [https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.boschloo_exact.html]
> [https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.barnard_exact.html]
> Note that the search for the nuisance parameter involves a univariate
> function with multiple minima. The implementations in R and SciPy both use
> multiple start points to find candidate locations for a search for a maxima.
> This is done by using N uniform points in (0, 1) and then (optionally)
> optimising the best candidate to find the maximum. The function requires
> numerical differentiation and would be suitable for a non-derivative method
> such as Brent optimisation for the univariate case.
> See also:
> [https://en.wikipedia.org/wiki/Boschloo%27s_test]
> [https://en.wikipedia.org/wiki/Barnard%27s_test]
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)