Re: Comparing contigency table output

Donald Burrill Thu, 06 Mar 2003 08:14:52 -0800

On Thu, 6 Mar 2003 Steve <[EMAIL PROTECTED]> wrote:

> Is there a method available of comparing the output of the layers in
> a 3 way contigency table.  Specifically I wish to test if an
> association present between a pair of categories in one layer is
> present in another, and their relative strength. Layers are not of
> equal sample size.
>
> I have a hunch that the easiest way of doing this is to compare the
> ratio observed/expected between layers.  But is there a rigorous
> method of comparing such ratios?


I wouldn't put much faith in those ratios, partly because "expected"
(if it's from the usual two-dimensional expected frequency under the
hypothesis of independence of classifications) does not really reflect
the model you want to consider.

What I would do in these circumstances is to generate a set of expected
frequencies that DOES represent the model to be considered, departures
from which will be interesting.

Some notation:  R = rows, C = columns, L = layers (nationalities in
your example), N = total number of observations in the RxCxL table;
n(L1), n(L2), ... = numbers of observations in each layer L1, L2, ... .

Start with the overall RxC table.
 Construct expected frequencies for the RxC subtables in each layer
by multiplying the overall RxC frequencies by n(Li)/N.
Accumulate contributions to a total chi-square of the usual form:
 (observed - expected)^2/(expected) for each of the R*C*L cells,
 added together over all cells.
This total can be considered a chi-square with an appropriate number
of degrees of freedom, which I am too lazy to work out at the moment
but which might be something like
  DF = (R-1)*(C-1)*(L-1) - (R-1)*(C-1)
 (because you've specified R*C constraints in the expected frequencies
of each table, but they're the same constraints for each layer and
among the R*C values there are (R-1)(C-1) d.f.;  but I'd want to work
out the proper algebra before placing any bets on this).

That will tell you whether the data will reject the hypothesis that the
association between R and C (whatEVER it may be) is independent of
levels of L.  But there's more:  the subtotal (of those contributions to
a total chi-square) for each level of L will tell you whether THAT level
is different enough from the overall pattern to be interesting.  And the
contribution from each individual cell will tell you which cells are
inducing the effect (if any) at this level of L.

Of course, if you find any interesting effects, there will be some
more work needed to detect whether OTHER interesting effects were
being masked by what you've found so far.  But that's a standard
problem in RxC contingency table work, and the same techniques for
dealing with them will also work in this context:  that's a topic for
another day.
                          -- DFB.

> A real world example:
>
> Each case in my data is an scientific article concerning marine
> turtles, categorised by species involved, field of research and
> nationality of author.  I know that ignoring nationality there is a
> strong positive association between research on green turtles and
> resarch on pathology.  Is this association independant of
> nationality?
>
> Thank you in advance for any help.
>
> Steve

 -----------------------------------------------------------------------
 Donald F. Burrill                                            [EMAIL PROTECTED]
 56 Sebbins Pond Drive, Bedford, NH 03110                 (603) 626-0816

.
.
=================================================================
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at:
.                  http://jse.stat.ncsu.edu/                    .
=================================================================

Re: Comparing contigency table output

Reply via email to