Gimenez Olivier <[EMAIL PROTECTED]> wrote:
 
> ... we have three samples arising from three multinomials
> with the same number of cells. This can be represented as a table:
>
> n11 n12 ... n1k      (1)
> n21 n22 ... n2k      (2)
> n31 n32 ... n3k      (3)
>
> We would like to know whether the last sample (3) can be
> considered a mixture of (1) and (2).
>
> Some help would be appreciated, especially references.
 
If you know the mixing proportions with which (1) and (2) combine, a
simple approach would be:
 
1.  Convert (1) and (2) to expected probability distributions:
 
       p11 p12 ... p1k    (4)
       p21 p22 ... p2k    (5)
 
    by dividing each nij by the appropriate row total.
 
2.  From the results, calculate a table of expected proportions
    for the mixture,
 
       q1  q2  ...  qk
 
    where
 
       q1 = r(p11) + (1 - r)(p21)
       q2 = r(p12) + (1 - r)(p22)
       ...
       qk = r(p1k) + (1 - r)(p2k)
 
    and r, (1 - r) are the mixing proportions, with 0 < r < 1.
 
3.  Let N3 be the number of observations in (3) above.
    Calculate expected frequencies e1, e2, ... ek as
 
       e1 = N3 q1
       e2 = N3 q2
       ...
       ek = N3 qk
 
4.  Compare the observed frequency distribution:
 
       n31  n32  ...  n3k
 
    with the expected frequency distribution:
 
       e1   e2   ...  ek
 
    using the likelihood ratio (LR) chi-squared test.  For large
    samples, the statistic is distributed as approximately
    chi-squared with k-1 df.  A nonsignificant result is consistent
    with the hypothesis that (3) is a mixture of (1) and (2).
 
    You can also use the Pearson chi-squared test to compare
    the distributions.  It would also have k-1 df.
 
If you don't know the mixing proportion a priori, you would need to
estimate it.  The usual criterion is maximum likelihood--i.e., the value
of r that maximizes the likelihood of observing n31, n32, ..., n3k given
q1, q2, ..., q3.  However, the maximum likelihood value of r is the same
as the value that gives the lowest LR chi-squared; so you could just use
trial-and-error to test different values of r until you find the best
value.
 
If you estimate r, the df for the LR chi-squared test are k - 2.
 
For the formulas to calculate the LR and Pearson chi-squared statistics,
you could check:
 
     Bishop YMM, Fienberg SE, Holland PW.  Discrete multivariate
     analysis: theory and practice.  Cambridge, Massachusetts:  MIT
     Press, 1975
 
or any text on loglinear modeling, or one of Alan Agresti's books on
categorical data analysis.
 
--
John Uebersax
http://ourworld.compuserve.com/homepages/jsuebersax
[EMAIL PROTECTED]
 
 
 


=================================================================
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
                  http://jse.stat.ncsu.edu/
=================================================================

Reply via email to