Re: Comparing contigency table output

Donald Burrill Thu, 06 Mar 2003 20:47:45 -0800

While driving to choir pracice tonight I thought of a way to carry out
the analysis suggested in my earlier post, that may be easier than you
were contemplating.  Requires only a stat. package that does ordinary
two-dimensional contingency table chi-square analysis.


I had written:

> Some notation:  R = rows, C = columns, L = layers (nationalities in
> your example), N = total number of observations in the RxCxL table.
>
> Start with the overall RxC table.
>  Construct expected frequencies for the RxC subtables in each layer
> by multiplying the overall RxC frequencies by n(Li)/N.
> Accumulate contributions to a total chi-square of the usual form:
>  (observed - expected)^2/(expected) for each of the R*C*L cells,
>  added together over all cells.
> This total can be considered a chi-square with an appropriate number
> of degrees of freedom, which I am too lazy to work out at the moment
> but which might be something like
>   DF = (R-1)*(C-1)*(L-1) - (R-1)*(C-1)
>  (because you've specified R*C constraints in the expected frequencies
> of each table, but they're the same constraints for each layer and
> among the R*C values there are (R-1)(C-1) d.f.;  but I'd want to work
> out the proper algebra before placing any bets on this).

You can convert this 3-dimensional table into a two-dimensional one by
stringing the RxC cells out into a single vector.  E.g.,
 V = 10*R + C  will yield two-digit numbers (if R and C have values that
do not exceed 10; the method can be altered if there are more) in which
the first digit is "R" and the second is "C".  You may want to recode
this V to have successive values from 1 to R*C, but it shouldn't be
necessary unless you're stuck with a rather primitive contingency-table
analysis program.

Now treat your data set as entries in a  VxL  table.  The chi-square
calculation is automatic (so no special programming is required), and
one can usually obtain cell-wise contributions to chi-square, or the
standardized residuals (whose square is the contribution to chi-square
and whose sign tells you whether the observed frequency is higher or
lower than the expected frequency), or both.

Notice that this scheme works both for raw data (for which your analysis
program will accumulate the frequencies) and for frequency tables
(supposing the table is stored as values of 4 variables:  R, C, L, and
frequency).

Hope this helps!    -- Don.
 -----------------------------------------------------------------------
 Donald F. Burrill                                            [EMAIL PROTECTED]
 56 Sebbins Pond Drive, Bedford, NH 03110                 (603) 626-0816

.
.
=================================================================
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at:
.                  http://jse.stat.ncsu.edu/                    .
=================================================================

Re: Comparing contigency table output

Reply via email to