Jim, a few comments in addition to those made by other respondents:

On Mon, 15 Jan 2001, Jim Kroger wrote in part:

> I'm doing a two-way, 2X2 ANOVA. Suppose I have 20 subjects, and each has
> 25 observations of the following types:
> 
> drug1-doseA (25 for each subject)
> drug1-doseB ( " )
> drug2-doseA
> drug2-doseB
                You have not stated how many subjects (Ss) receive each 
drug/dose combination.  If all 20 Ss receive all 4 combinations, as 
implied by your assertion of 80 data points (below), then the design is 
not completely given:  you must also have a factor representing the order 
in which the combinations were encountered by each S.  If all Ss got the 
various drug/dose combinations in the same order, you have no way of 
telling whether (e.g.) the first combination affected responses to the 
later combinations, nor even in which direction.  Also in this case, as 
one other respondent pointed out, you have a repeated-measures design:
that is, instead of  S(G x E)  as in the usual 2-way ANOVA (using S for 
Subjects, G for druG, E for dosE), you have  S x G x E,  which implies 
(inter alia) different error terms for G, E, and GE.

(In that notation, using R for the raw data values within each S, the two 
designs above would be  R(S(B x E)  and  R(S) x G x E .  As remarked 
below, the same ratios of mean squares are computed, whether R is 
explicitly accounted for in the design or not.)

> ... The objective is to determine whether there is a main effect of 
> either factor (drug type and doseage) on temperature, and whether there 
> is an interaction between the two.
> 
> My question is, should I determine an average for each subject in each 
> of the four cells, and use this as data to put into the ANOVA, or 
> should I put the raw trials themselves into the ANOVA?

        As others have pointed out, it makes no difference.  Your 
hypotheses are tested by comparisons among the cell means, and the 
denominator mean square for each hypothesis will be the estimated 
sampling variance of the means in question.  Whether you use the mean of 
25 trials for each S, or the 25 raw trials themselves, you get the same 
numbers. 

> There would be 80 datapoints and 2000 data points respectively, 20 or 
> 500 per cell.  I have seen both approaches taken but never heard a 
> satisfactory justification.  It would seem they are not equivalent 
> since the latter, having more observations, has greater power. 

        As remarked above, this is not true.  One has the same numerical 
estimates, and the same numbers of degrees of freedom in numerator and 
denominator, for each hypothesis test.  As other respondents have 
mentioned, if the raw data are a 0/1 dichotomy, there may be an advantage 
in using a logistic rather than a normal model;  but if the proportions 
(the several cell means) are not very close to 0 or 1, say between .25 
and .75, there will not be much difference in the results of the 
analysis. 

 ----------------------------------------------------------------------
 Donald F. Burrill                                    [EMAIL PROTECTED]
 348 Hyde Hall, Plymouth State College,      [EMAIL PROTECTED]
 MSC #29, Plymouth, NH 03264                             (603) 535-2597
 Department of Mathematics, Boston University                [EMAIL PROTECTED]
 111 Cummington Street, room 261, Boston, MA 02215       (617) 353-5288
 184 Nashua Road, Bedford, NH 03110                      (603) 471-7128



=================================================================
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
                  http://jse.stat.ncsu.edu/
=================================================================

Reply via email to