While driving to choir pracice tonight I thought of a way to carry out the analysis suggested in my earlier post, that may be easier than you were contemplating. Requires only a stat. package that does ordinary two-dimensional contingency table chi-square analysis.
I had written: > Some notation: R = rows, C = columns, L = layers (nationalities in > your example), N = total number of observations in the RxCxL table. > > Start with the overall RxC table. > Construct expected frequencies for the RxC subtables in each layer > by multiplying the overall RxC frequencies by n(Li)/N. > Accumulate contributions to a total chi-square of the usual form: > (observed - expected)^2/(expected) for each of the R*C*L cells, > added together over all cells. > This total can be considered a chi-square with an appropriate number > of degrees of freedom, which I am too lazy to work out at the moment > but which might be something like > DF = (R-1)*(C-1)*(L-1) - (R-1)*(C-1) > (because you've specified R*C constraints in the expected frequencies > of each table, but they're the same constraints for each layer and > among the R*C values there are (R-1)(C-1) d.f.; but I'd want to work > out the proper algebra before placing any bets on this). You can convert this 3-dimensional table into a two-dimensional one by stringing the RxC cells out into a single vector. E.g., V = 10*R + C will yield two-digit numbers (if R and C have values that do not exceed 10; the method can be altered if there are more) in which the first digit is "R" and the second is "C". You may want to recode this V to have successive values from 1 to R*C, but it shouldn't be necessary unless you're stuck with a rather primitive contingency-table analysis program. Now treat your data set as entries in a VxL table. The chi-square calculation is automatic (so no special programming is required), and one can usually obtain cell-wise contributions to chi-square, or the standardized residuals (whose square is the contribution to chi-square and whose sign tells you whether the observed frequency is higher or lower than the expected frequency), or both. Notice that this scheme works both for raw data (for which your analysis program will accumulate the frequencies) and for frequency tables (supposing the table is stored as values of 4 variables: R, C, L, and frequency). Hope this helps! -- Don. ----------------------------------------------------------------------- Donald F. Burrill [EMAIL PROTECTED] 56 Sebbins Pond Drive, Bedford, NH 03110 (603) 626-0816 . . ================================================================= Instructions for joining and leaving this list, remarks about the problem of INAPPROPRIATE MESSAGES, and archives are available at: . http://jse.stat.ncsu.edu/ . =================================================================
