Sorry, this turned out to be rather longer than I'd anticipated. 
Maybe I should have broken it into parts...

On Wed, 22 Dec 1999, Rich Ulrich wrote:

<< There were several earlier messages, and then 
  I thought Don Burrill said most of what needed to be said -- >>

                [ snip, various details.]

<< Later Don recommended constructing dummies for the Interactions in
  such a manner so that they would be orthogonal to the main effects, 
  in order to reduce confusion of confounded interpretations;  that 
  bit of advice received a minor criticism from someone else who 
  pointed out that you should never be trying to interpret *those* 
  coefficients in the first place. >>

Never say "never"!  See below ...  :-)

<< Well, I agree with both Don and the critic.  I create my 
  interactions as orthogonal, or approximately orthogonal -- in the 
  old days, your program was too likely to blow up if you did not 
  get rid of all the numerical problems you could, whenever you 
  could.  Further, if I happen to look at the wrong listing, it will 
  still have numbers that are in the right range, and PROBABLY right. 
  Finally, it may be a cheap piece of consistency, but it gives me 
  one less item that I have to explain to the non-statisticians who 
  look at various results.

<< Like the critic, though, I never want to interpret the coded main
  effects in any regression that has also included the interactions. >>

        Why not?  If an interaction is significant, do you mean to 
say that you would never attempt to make sense of the main effects, 
or of the pattern of results involving (main effects + interactions)?  
Surely you cannot mean that you would conduct an analysis _without_ 
the interactions, and attempt to interpret the main effects in such 
an analysis -- these results would likely be horribly misleading, if 
interaction effects are in fact present.  What would you do in the 
context of, say, a four-factor ANOVA?  Would you take the presence of 
significant interactions to imply that main effects cannot, or perhaps 
should not, be interpreted? 

<<  - I would not mind receiving guidance on this final point.  It is
*conceivable*  to use codings so that the coefficients and tests for
main effects do have meaning when the interaction is included in a
regression.  >>

Let me describe an example.  In the Minitab data set PULSE there are 
eight variables:  
        an initial pulse rate (PULSE1), 
        a second pulse rate (PULSE2, taken 1 minute after PULSE1), 
        a binary experimental/control variable (RAN, = 1 if the 
                respondent ran in place during the minute between pulse 
                measurements, = 2 if the respondent sat quietly during 
                the minute), 
        SMOKES (= 1 if the respondent smoked regularly, = 2 if not), 
        SEX (= 1 if male, = 2 if female), 
        HEIGHT (in inches), 
        WEIGHT (in lbs), and 
        ACTIVITY (= 1 if the respondent reported little or no regular 
                exercise or physical activity, = 2 if the respondent 
                reported a moderate amount, = 3 if "a lot").  

We focus on the first five variables.

        If we model PULSE2 as a function of (PULSE1, RAN, SMOKES, SEX) 
we might have a 3-way analysis of covariance with one covariate (PULSE1). 
Since the data are unbalanced, practically all ANCOVA programs will choke 
on them and refuse to yield an analysis.  So we use multiple regression. 
(Which, among other virtues, permits us to examine interactions of the 
binary factors and PULSE1;  this would be impossible in "classical" (?!) 
ANCOVA.)  Skipping the boring details (which may be found in a paper of 
mine on the Minitab home page), a reasonable reduced model retains as 
predictors PULSE1, RAN, SEX, SMOKES, SMOKES*RAN, and SEX*RAN, when 
SMOKES*RAN and SEX*RAN have been constructed as orthogonal to SMOKES and 
RAN, and to SEX and RAN, respectively. 
        Interestingly, if one recodes RAN, SEX, and SMOKES to (0,1) for 
ease of interpretation (RAN = 1 if ran in place, = 0 if not;  SMOKES = 1 
if smoker, 0 if not;  SEX = 1 if female, = 0 if male), constructs the two 
interactions as the simple products of their components (SEX*RAN = 1 if a 
female who ran, = 0 else;  SMOKES*RAN = 1 if a smoker who ran, = 0 else), 
and then fits the regression model

        PULSE2 = a + b1*PULSE1 + b2*RAN + b3*SMOKES + b4*SEX 
                + b5*SEX*RAN + b6*SMOKES*RAN + error

the coefficients b3 and b4 are indistinguishable from zero.  Dropping 
SEX and SMOKES from the model, we then have

  PULSE2 = a + b1*PULSE1 + b2*RAN + b5*SEX*RAN + b6*SMOKES*RAN + error

Now some folks get unhappy at thme idea of retaining an interaction 
(e.g., SEX*RAN) without one of its main effects (e.g., SEX);  but look 
at what we have:  
 (1) A common slope of PULSE2 on PULSE1 for all subgroups (b1),
        which may be not unreasonable;
 (2) a common line for everyone who did not run, which is reasonable
        (then RAN = SEX*RAN = SMOKES*RAN = 0):
        this line is  PULSE2 = a + b1*PULSE1;
 (3) an increment in pulse rate (= b2) for those who ran in place,
        as one would expect;
 (4) an additional increment (= b5) for females who ran in place,
        which one might not expect;
 (5) a (small, as it turned out) decrement (= b6) for smokers who ran 
        in place, which might not be surprising.

These convenient interpretations arise from using indicator variables 
with (0,1) coding (and from being moderately clever in deciding what 
level to assign to 0 in each case).  But getting to this reduced model 
is difficult without going through orthogonalizing routines on the way, 
because of the spurious intercorrelations among interaction terms, most 
of which turn out not to be significant.  Having reduced the predictors 
to the six shown above, it is then convenient to use (0,1) coding because 
of the kinds of interpretation that thus become possible;  and then one 
observes, as a function of this coding in particular, that the "main 
effects" for SMOKES and SEX vanish, leaving only the two product terms. 
Moreover, the product terms themselves are conveniently interpretable, 
being indicator variables (indicating a particular subgroup in the data).

<< If it is what I remember seeing in an ANOVA text many
years ago, the weights and coefficients can be constructed to take
into account the Ns of the cells (more complicated than -1,0,1). 
 I believe:  The test that this gives you for main effects is either
exactly the same as some other way of constructing the problem, or it
is considered obsolete.  >>

I remember knowing that one could do this.  Never saw much point in 
it, though;  in nearly every case of unequal N's I've encountered, 
I've been much more interested in "unweighted" (which really means 
equally weighted) means than in means weighted by the cell N's. 
I can see that others might feel differently;  this is, in essence, 
the same debate that arose in the Constitutional Convention, and which 
was resolved by the compromise that certain matters would be dealt with 
in a Senate (unweighted by population, equally weighted by jurisdiction) 
and others in a House of Representatives (weighted by population within 
jurisdictions).

<< The construction that I like is Searle's partitioning of sums of 
squares, usually in a hierarchy:   (A), (B|A), (AB|A,B)  [e.g.]. >>

Yes, and that's one of the reasons I like Minitab's regression output, 
because it provides these hierarchical sums of squares (sequential SS 
is Minitab's label for them), in the order in which the predictors 
were specified in the regression command.

<< Today, Burke Johnson sent an SPSS-worked example to Don B., with a 
CC: to me, since I had posted earlier.  The example was supposed to show
that different codings give different results.  The example shows that
the total SS and test is always the same.  And the example shows that
different codings can give you different results for coefficients and
tests when you look at Main effects when Interactions are already in
the equation -- which is entirely consistent with what Don and I (I
think) have both said, ... >>

Certainly this is what I would expect.  Unfortunately, I do not have 
access to SPSS, and therefore have not been able to observe Burke's 
example (which was an attached SPSS system file, and an SPSS printout 
of the kind that can only be printed by SPSS).  (I much prefer raw data 
files with descriptors -- possibly an SPSS command file with embedded
data -- and straight ASCII text output.  Both take up much less space 
than encoded files like SPSS system files and spooled-output files.)
        The phrase "results for coefficients" is a bit ambiguous.  
Packages routinely provide for each coefficient a formal test of the 
hypothesis that the population value of the coefficient is zero;  
but such tests are conditional, on the presence of all the other effects 
that are being modelled when there has been no attempt to make the 
effects modelled mutually orthogonal.  And it is well known that when 
predictors are highly correlated, tests of their coefficients are likely 
to appear non-significant, even when a quite large (and significant) 
proportion of the variance of the dependent variable is explained by the 
predictors jointly.
        (That was one of the points in the paper cited above.  Using 
the original coding in the data file -- (1,2) for the three binary 
variables -- and constructing interaction variables by just multiplying 
these together, Minitab refused to fit a full model with all 15 
predictors (up to the 4-way interaction SEX*RAN*SMOKES*PULSE1), and 
for the 14 variables it did deign to use NONE of the coefficients was 
significant;  because of the spurious intercorrelations among products 
arising from that choice of coding (and from failing to orthogonalize 
the products).  But when all the _interactions_ were orthogonalized 
with respect to their main effects and lower-order interactions, one 
analysis sufficed to reduce the original 15 predictors to the six cited
above;  and then USING the fact of correlation between raw variables 
(now recoded to (0,1)) and their simple products, the model was further 
reduced to four predictors (also shown above).)

<< i.e., those effects should be presumed to be uninterpretable ...>>

I would never so presume.  I might agree that interpretation is, 
sometimes, difficult;  I might acknowledge that I had not (yet) found 
a satisfying interpretation;  but to assert "uninterpretable" seems 
rather like refusing a challenge just because it looks not-easy.

<< -- so the illustration just heightens the question of whether 
those effects are *ever* interpretable, ... >>

Did I succeed in showing a workable interpretation (of the Pulse data), 
do you think?

<< ... since the inconsistency proves that they are not strictly 
interpretable for most sets of coefficients. >>

What "inconsistency"?  (I may be hampered by not having observed Burke's 
example;  I take it Rich refers to different coefficients, or perhaps to 
different p-values, for differently-coded variables.  This is not an 
inconsistency, it's a natural feature of the landscape.  If one permits
interactions to be unnecessarily correlated with other interactions and/or 
main effects, it should not be surprising that (a) arriving at a suitable 
reduced model is more troublesome than it need be and (b) significance 
tests on the coefficients (but not on the _effects_ -- that is, on the 
several SSs) are sensitive to the way in which the interaction terms were 
constructed.)
        (I'm not at all sure I understand what you want to mean by 
"strictly interpretable".  Did you have some kind of "non-strictly" in 
mind?  "Fuzzily", perhaps?)
                                -- Don.
 ------------------------------------------------------------------------
 Donald F. Burrill                                 [EMAIL PROTECTED]
 348 Hyde Hall, Plymouth State College,          [EMAIL PROTECTED]
 MSC #29, Plymouth, NH 03264                                 603-535-2597
 184 Nashua Road, Bedford, NH 03110                          603-471-7128  

Reply via email to