Sorry, this turned out to be rather longer than I'd anticipated.
Maybe I should have broken it into parts...
On Wed, 22 Dec 1999, Rich Ulrich wrote:
<< There were several earlier messages, and then
I thought Don Burrill said most of what needed to be said -- >>
[ snip, various details.]
<< Later Don recommended constructing dummies for the Interactions in
such a manner so that they would be orthogonal to the main effects,
in order to reduce confusion of confounded interpretations; that
bit of advice received a minor criticism from someone else who
pointed out that you should never be trying to interpret *those*
coefficients in the first place. >>
Never say "never"! See below ... :-)
<< Well, I agree with both Don and the critic. I create my
interactions as orthogonal, or approximately orthogonal -- in the
old days, your program was too likely to blow up if you did not
get rid of all the numerical problems you could, whenever you
could. Further, if I happen to look at the wrong listing, it will
still have numbers that are in the right range, and PROBABLY right.
Finally, it may be a cheap piece of consistency, but it gives me
one less item that I have to explain to the non-statisticians who
look at various results.
<< Like the critic, though, I never want to interpret the coded main
effects in any regression that has also included the interactions. >>
Why not? If an interaction is significant, do you mean to
say that you would never attempt to make sense of the main effects,
or of the pattern of results involving (main effects + interactions)?
Surely you cannot mean that you would conduct an analysis _without_
the interactions, and attempt to interpret the main effects in such
an analysis -- these results would likely be horribly misleading, if
interaction effects are in fact present. What would you do in the
context of, say, a four-factor ANOVA? Would you take the presence of
significant interactions to imply that main effects cannot, or perhaps
should not, be interpreted?
<< - I would not mind receiving guidance on this final point. It is
*conceivable* to use codings so that the coefficients and tests for
main effects do have meaning when the interaction is included in a
regression. >>
Let me describe an example. In the Minitab data set PULSE there are
eight variables:
an initial pulse rate (PULSE1),
a second pulse rate (PULSE2, taken 1 minute after PULSE1),
a binary experimental/control variable (RAN, = 1 if the
respondent ran in place during the minute between pulse
measurements, = 2 if the respondent sat quietly during
the minute),
SMOKES (= 1 if the respondent smoked regularly, = 2 if not),
SEX (= 1 if male, = 2 if female),
HEIGHT (in inches),
WEIGHT (in lbs), and
ACTIVITY (= 1 if the respondent reported little or no regular
exercise or physical activity, = 2 if the respondent
reported a moderate amount, = 3 if "a lot").
We focus on the first five variables.
If we model PULSE2 as a function of (PULSE1, RAN, SMOKES, SEX)
we might have a 3-way analysis of covariance with one covariate (PULSE1).
Since the data are unbalanced, practically all ANCOVA programs will choke
on them and refuse to yield an analysis. So we use multiple regression.
(Which, among other virtues, permits us to examine interactions of the
binary factors and PULSE1; this would be impossible in "classical" (?!)
ANCOVA.) Skipping the boring details (which may be found in a paper of
mine on the Minitab home page), a reasonable reduced model retains as
predictors PULSE1, RAN, SEX, SMOKES, SMOKES*RAN, and SEX*RAN, when
SMOKES*RAN and SEX*RAN have been constructed as orthogonal to SMOKES and
RAN, and to SEX and RAN, respectively.
Interestingly, if one recodes RAN, SEX, and SMOKES to (0,1) for
ease of interpretation (RAN = 1 if ran in place, = 0 if not; SMOKES = 1
if smoker, 0 if not; SEX = 1 if female, = 0 if male), constructs the two
interactions as the simple products of their components (SEX*RAN = 1 if a
female who ran, = 0 else; SMOKES*RAN = 1 if a smoker who ran, = 0 else),
and then fits the regression model
PULSE2 = a + b1*PULSE1 + b2*RAN + b3*SMOKES + b4*SEX
+ b5*SEX*RAN + b6*SMOKES*RAN + error
the coefficients b3 and b4 are indistinguishable from zero. Dropping
SEX and SMOKES from the model, we then have
PULSE2 = a + b1*PULSE1 + b2*RAN + b5*SEX*RAN + b6*SMOKES*RAN + error
Now some folks get unhappy at thme idea of retaining an interaction
(e.g., SEX*RAN) without one of its main effects (e.g., SEX); but look
at what we have:
(1) A common slope of PULSE2 on PULSE1 for all subgroups (b1),
which may be not unreasonable;
(2) a common line for everyone who did not run, which is reasonable
(then RAN = SEX*RAN = SMOKES*RAN = 0):
this line is PULSE2 = a + b1*PULSE1;
(3) an increment in pulse rate (= b2) for those who ran in place,
as one would expect;
(4) an additional increment (= b5) for females who ran in place,
which one might not expect;
(5) a (small, as it turned out) decrement (= b6) for smokers who ran
in place, which might not be surprising.
These convenient interpretations arise from using indicator variables
with (0,1) coding (and from being moderately clever in deciding what
level to assign to 0 in each case). But getting to this reduced model
is difficult without going through orthogonalizing routines on the way,
because of the spurious intercorrelations among interaction terms, most
of which turn out not to be significant. Having reduced the predictors
to the six shown above, it is then convenient to use (0,1) coding because
of the kinds of interpretation that thus become possible; and then one
observes, as a function of this coding in particular, that the "main
effects" for SMOKES and SEX vanish, leaving only the two product terms.
Moreover, the product terms themselves are conveniently interpretable,
being indicator variables (indicating a particular subgroup in the data).
<< If it is what I remember seeing in an ANOVA text many
years ago, the weights and coefficients can be constructed to take
into account the Ns of the cells (more complicated than -1,0,1).
I believe: The test that this gives you for main effects is either
exactly the same as some other way of constructing the problem, or it
is considered obsolete. >>
I remember knowing that one could do this. Never saw much point in
it, though; in nearly every case of unequal N's I've encountered,
I've been much more interested in "unweighted" (which really means
equally weighted) means than in means weighted by the cell N's.
I can see that others might feel differently; this is, in essence,
the same debate that arose in the Constitutional Convention, and which
was resolved by the compromise that certain matters would be dealt with
in a Senate (unweighted by population, equally weighted by jurisdiction)
and others in a House of Representatives (weighted by population within
jurisdictions).
<< The construction that I like is Searle's partitioning of sums of
squares, usually in a hierarchy: (A), (B|A), (AB|A,B) [e.g.]. >>
Yes, and that's one of the reasons I like Minitab's regression output,
because it provides these hierarchical sums of squares (sequential SS
is Minitab's label for them), in the order in which the predictors
were specified in the regression command.
<< Today, Burke Johnson sent an SPSS-worked example to Don B., with a
CC: to me, since I had posted earlier. The example was supposed to show
that different codings give different results. The example shows that
the total SS and test is always the same. And the example shows that
different codings can give you different results for coefficients and
tests when you look at Main effects when Interactions are already in
the equation -- which is entirely consistent with what Don and I (I
think) have both said, ... >>
Certainly this is what I would expect. Unfortunately, I do not have
access to SPSS, and therefore have not been able to observe Burke's
example (which was an attached SPSS system file, and an SPSS printout
of the kind that can only be printed by SPSS). (I much prefer raw data
files with descriptors -- possibly an SPSS command file with embedded
data -- and straight ASCII text output. Both take up much less space
than encoded files like SPSS system files and spooled-output files.)
The phrase "results for coefficients" is a bit ambiguous.
Packages routinely provide for each coefficient a formal test of the
hypothesis that the population value of the coefficient is zero;
but such tests are conditional, on the presence of all the other effects
that are being modelled when there has been no attempt to make the
effects modelled mutually orthogonal. And it is well known that when
predictors are highly correlated, tests of their coefficients are likely
to appear non-significant, even when a quite large (and significant)
proportion of the variance of the dependent variable is explained by the
predictors jointly.
(That was one of the points in the paper cited above. Using
the original coding in the data file -- (1,2) for the three binary
variables -- and constructing interaction variables by just multiplying
these together, Minitab refused to fit a full model with all 15
predictors (up to the 4-way interaction SEX*RAN*SMOKES*PULSE1), and
for the 14 variables it did deign to use NONE of the coefficients was
significant; because of the spurious intercorrelations among products
arising from that choice of coding (and from failing to orthogonalize
the products). But when all the _interactions_ were orthogonalized
with respect to their main effects and lower-order interactions, one
analysis sufficed to reduce the original 15 predictors to the six cited
above; and then USING the fact of correlation between raw variables
(now recoded to (0,1)) and their simple products, the model was further
reduced to four predictors (also shown above).)
<< i.e., those effects should be presumed to be uninterpretable ...>>
I would never so presume. I might agree that interpretation is,
sometimes, difficult; I might acknowledge that I had not (yet) found
a satisfying interpretation; but to assert "uninterpretable" seems
rather like refusing a challenge just because it looks not-easy.
<< -- so the illustration just heightens the question of whether
those effects are *ever* interpretable, ... >>
Did I succeed in showing a workable interpretation (of the Pulse data),
do you think?
<< ... since the inconsistency proves that they are not strictly
interpretable for most sets of coefficients. >>
What "inconsistency"? (I may be hampered by not having observed Burke's
example; I take it Rich refers to different coefficients, or perhaps to
different p-values, for differently-coded variables. This is not an
inconsistency, it's a natural feature of the landscape. If one permits
interactions to be unnecessarily correlated with other interactions and/or
main effects, it should not be surprising that (a) arriving at a suitable
reduced model is more troublesome than it need be and (b) significance
tests on the coefficients (but not on the _effects_ -- that is, on the
several SSs) are sensitive to the way in which the interaction terms were
constructed.)
(I'm not at all sure I understand what you want to mean by
"strictly interpretable". Did you have some kind of "non-strictly" in
mind? "Fuzzily", perhaps?)
-- Don.
------------------------------------------------------------------------
Donald F. Burrill [EMAIL PROTECTED]
348 Hyde Hall, Plymouth State College, [EMAIL PROTECTED]
MSC #29, Plymouth, NH 03264 603-535-2597
184 Nashua Road, Bedford, NH 03110 603-471-7128