On 1 Mar 2003 09:57:29 -0800, David Reilly wrote:

>A colleague has asked me why one tests the significance of a main
>effect ...say there are three groups or classes ....simultaneously via
>an F test rather than simply specifying two dummy variables and a mean
>in a Regression Model and doing a step-down.

[...]

>He contends that one can simply stepdown from the complete model and
>delete one-by-one any of the two dummy variables until parsimony is
>achieved. This avoids the portmanteau F test and relies strictly on
>the individual T values ...

The problem is that the dummy variables don't represent groups per se,
but represent differences in means between groups. So the conclusions
from this approach depend on how you chose to define the dummy
variables. Compare

x1=1 if group 1, 0 otherwise (coefficient compares groups 1 and 3)
x2=1 if group 2, 0 otherwise (coeff compares groups 2 and 3)

with

x1=1 if group 2, 0 otherwise (coeff compares 1 and 2)
x2=1 if group 3, 0 otherwise (coeff compares 1 and 3)

Because you're not comparing all possible pairs of groups, you can't
possibly get the proper picture of how all the groups compare (as you
would with an F-test followed by Tukey or whatever). I imagine it would
be easy to demonstrate this with data -- for instance, if F-plus-Tukey
says that only groups 1 and 2 have significantly different means, and
the mean of group 3 is about halfway between the means of groups 1 and
2, what would come out of the first choice of dummy variables above? 

Also, think about what would come out of the end of the step-down
process: if you get one significant dummy variable, it means there's a
significant difference in means between the group providing the 1's and
*all the rest of the data combined*, which is not usually what you're
trying to conclude in ANOVA.

Quite aside from all this, there's the general issue of how much you can
trust any P-values that come out of a step-by-step procedure such as the
one proposed.

>I can't find any textbook that supports my position and suggests that
>what he is doing is flawed.

Any text that deals with multiple regression should address the choice
in ways of defining dummy variables, at least, and then you can go from
there.

-- 
Ken Butler
Possibly at home somewhere in Kent,
but possibly not.
.
.
=================================================================
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at:
.                  http://jse.stat.ncsu.edu/                    .
=================================================================

Reply via email to