Gimenez Olivier <[EMAIL PROTECTED]> wrote:
> ... we have three samples arising from three multinomials
> with the same number of cells. This can be represented as a table:
>
> n11 n12 ... n1k (1)
> n21 n22 ... n2k (2)
> n31 n32 ... n3k (3)
>
> We would like to know whether the last sample (3) can be
> considered a mixture of (1) and (2).
>
> Some help would be appreciated, especially references.
If you know the mixing proportions with which (1) and (2) combine, a
simple approach would be:
1. Convert (1) and (2) to expected probability distributions:
p11 p12 ... p1k (4)
p21 p22 ... p2k (5)
by dividing each nij by the appropriate row total.
2. From the results, calculate a table of expected proportions
for the mixture,
q1 q2 ... qk
where
q1 = r(p11) + (1 - r)(p21)
q2 = r(p12) + (1 - r)(p22)
...
qk = r(p1k) + (1 - r)(p2k)
and r, (1 - r) are the mixing proportions, with 0 < r < 1.
3. Let N3 be the number of observations in (3) above.
Calculate expected frequencies e1, e2, ... ek as
e1 = N3 q1
e2 = N3 q2
...
ek = N3 qk
4. Compare the observed frequency distribution:
n31 n32 ... n3k
with the expected frequency distribution:
e1 e2 ... ek
using the likelihood ratio (LR) chi-squared test. For large
samples, the statistic is distributed as approximately
chi-squared with k-1 df. A nonsignificant result is consistent
with the hypothesis that (3) is a mixture of (1) and (2).
You can also use the Pearson chi-squared test to compare
the distributions. It would also have k-1 df.
If you don't know the mixing proportion a priori, you would need to
estimate it. The usual criterion is maximum likelihood--i.e., the value
of r that maximizes the likelihood of observing n31, n32, ..., n3k given
q1, q2, ..., q3. However, the maximum likelihood value of r is the same
as the value that gives the lowest LR chi-squared; so you could just use
trial-and-error to test different values of r until you find the best
value.
If you estimate r, the df for the LR chi-squared test are k - 2.
For the formulas to calculate the LR and Pearson chi-squared statistics,
you could check:
Bishop YMM, Fienberg SE, Holland PW. Discrete multivariate
analysis: theory and practice. Cambridge, Massachusetts: MIT
Press, 1975
or any text on loglinear modeling, or one of Alan Agresti's books on
categorical data analysis.
--
John Uebersax
http://ourworld.compuserve.com/homepages/jsuebersax
[EMAIL PROTECTED]
=================================================================
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
http://jse.stat.ncsu.edu/
=================================================================