On 1 Mar 2003 09:57:29 -0800, David Reilly wrote: >A colleague has asked me why one tests the significance of a main >effect ...say there are three groups or classes ....simultaneously via >an F test rather than simply specifying two dummy variables and a mean >in a Regression Model and doing a step-down.
[...] >He contends that one can simply stepdown from the complete model and >delete one-by-one any of the two dummy variables until parsimony is >achieved. This avoids the portmanteau F test and relies strictly on >the individual T values ... The problem is that the dummy variables don't represent groups per se, but represent differences in means between groups. So the conclusions from this approach depend on how you chose to define the dummy variables. Compare x1=1 if group 1, 0 otherwise (coefficient compares groups 1 and 3) x2=1 if group 2, 0 otherwise (coeff compares groups 2 and 3) with x1=1 if group 2, 0 otherwise (coeff compares 1 and 2) x2=1 if group 3, 0 otherwise (coeff compares 1 and 3) Because you're not comparing all possible pairs of groups, you can't possibly get the proper picture of how all the groups compare (as you would with an F-test followed by Tukey or whatever). I imagine it would be easy to demonstrate this with data -- for instance, if F-plus-Tukey says that only groups 1 and 2 have significantly different means, and the mean of group 3 is about halfway between the means of groups 1 and 2, what would come out of the first choice of dummy variables above? Also, think about what would come out of the end of the step-down process: if you get one significant dummy variable, it means there's a significant difference in means between the group providing the 1's and *all the rest of the data combined*, which is not usually what you're trying to conclude in ANOVA. Quite aside from all this, there's the general issue of how much you can trust any P-values that come out of a step-by-step procedure such as the one proposed. >I can't find any textbook that supports my position and suggests that >what he is doing is flawed. Any text that deals with multiple regression should address the choice in ways of defining dummy variables, at least, and then you can go from there. -- Ken Butler Possibly at home somewhere in Kent, but possibly not. . . ================================================================= Instructions for joining and leaving this list, remarks about the problem of INAPPROPRIATE MESSAGES, and archives are available at: . http://jse.stat.ncsu.edu/ . =================================================================
