Hi Rachel,

I think at times like this it's useful to look at exactly how R assigns
factors. When you add interactions, R does a lot of behind-the-scenes work
that isn't immediately apparent. One way to look into this in more detail
is this really nice function "model.matrix", which given a data frame and a
model formula, will show you all of the coding variables that are created
in order to fit the model and what their values are for each combination of
factors in the dataset. I've bolded this below.

# create data frame w/ each factor level combo
d <- data.frame(Listener.f = rep(c("Listener1", "Listener2"), 3),
SyntaxType.f = c(rep("Syntax1", 2), rep("Syntax2", 2), rep("Syntax3", 2)),
Target_E2_pref = rnorm(6))
# make factor
d$Listener.f <- factor(d$Listener.f)
d$SyntaxType.f <- factor(d$SyntaxType.f)

*# create model formulas corresponding to full and reduced model*
*mod.formula <- formula(~ 1 + Listener.f * SyntaxType.f, d)*

*mod.formula.reduced <- formula(~ 1 +
SyntaxType.f + Listener.f:SyntaxType.f, d)*
*# get var assignments for all factor level combos*
*mod.matrix <- model.matrix(mod.formula, d)*
*mod.matrix.reduced <- model.matrix(mod.formula.reduced, d)*

If you look at mod.matrix and mod.matrix.reduced, you'll see that they each
have the same dimensionality. Digging in further, we can see why this is.
Let's look at the column names of each model matrix:

*[2] "Listener.fListener2"*
[3] "SyntaxType.fSyntax2"
[4] "SyntaxType.fSyntax3"
[5] "Listener.fListener2:SyntaxType.fSyntax2"
[6] "Listener.fListener2:SyntaxType.fSyntax3"

[1] "(Intercept)"
[2] "SyntaxType.fSyntax2"
[3] "SyntaxType.fSyntax3"
*[4] "SyntaxType.fSyntax1:Listener.fListener2"*
[5] "SyntaxType.fSyntax2:Listener.fListener2"
[6] "SyntaxType.fSyntax3:Listener.fListener2"

I've bolded the differences. Now don't ask me why, but the way that R
*appears* to handle subtracting a main effect from a model but keeping the
interaction is to add in another interaction dummy variable that makes the
model equivalent. (If you look at the values that each factor combo takes
on, you'll see that this particular dummy variable is 1 when Listener =
Listener2 and SyntaxType = Syntax1, and 0 otherwise).

The way to solve this is presented in Roger's paper he linked above (pg. 4
being the most relevant here). His particular example is for contrast
coding but you can make it work in the exact same way with dummy coding (*but
make sure that dummy coding is what you really want to use given the
specific hypothesis you're testing!*):

# make numeric versions of factors
d$Listener.numeric <- sapply(d$Listener.f,function(i)
contr.treatment(2)[i,]) # can easily replace w/ whatever coding scheme you
d$Syntax1.numeric <- sapply(d$SyntaxType.f,function(i)
contr.treatment(3)[i,])[1, ]
d$Syntax2.numeric <- sapply(d$SyntaxType.f,function(i)
contr.treatment(3)[i,])[2, ]

# check model matrix
mod.formula.new <- formula(~ 1 + Syntax1.numeric + Syntax2.numeric +
Listener.numeric:Syntax1.numeric + Listener.numeric:Syntax2.numeric, d)
mod.matrix.new <- model.matrix(mod.formula.new, d)

[1] "(Intercept)"
[2] "Syntax1.numeric"
[3] "Syntax2.numeric"
[4] "Syntax1.numeric:Listener.numeric"
[5] "Syntax2.numeric:Listener.numeric"

Now things are as they should be: no more mysterious extra dummy variable
containing information about the main effect of Listener! This last model
is what you should compare your original to get the significance of the
main effect of Listener.

Hope this was helpful!


On Tue, Sep 20, 2016 at 12:26 PM Levy, Roger <rl...@ucsd.edu> wrote:

> Hi Dan,
> I’m having a bit of trouble figuring out exactly how your two comments
> comport with one another, but I think the crucial point here is that the
> procedure I outline in the paper is simply how to do exactly what is done
> in traditional ANOVA analyses.  In this approach, the expected effect size
> of, for example, ListenerType does not depend on the relative amounts of
> data in the various levels of SyntaxType (which is what I think you’re
> referring to by “balance of the levels”).
> Your caveat regarding whether the main effect of a factor X1 necessarily
> has a sensible interpretation in the presence of the interaction between X1
> and X2 is certainly appropriate.  In the beginning of the paper I have a
> few remarks on the caution that should be applied.  I do think that for
> factorial ANOVA analyses the main effect can often have a useful
> interpretation as the “across-the-board” effect that X1 has, regardless of
> the value of X2 (which once again is the traditional ANOVA interpretation
> of a main effect).
> If my responses to your comments aren’t on target, I would be very glad
> for clarification!
> Best
> Roger
> On Sep 19, 2016, at 7:31 PM, Daniel Ezra Johnson <
> danielezrajohn...@gmail.com> wrote:
> If you follow this procedure, though, you'd be testing for the effect of
> Listener when SyntaxType is "in the middle" (unweighted) of the three
> levels. This quantity not only has no sensible interpretation, it also
> depends on the balance of the levels of SyntaxType in the data.
> If the effect of Listener is in the same direction for each level of
> SyntaxType, such a test might be useful, but otherwise I don't think it
> would be?
> Dan
> On Sep 19, 2016, at 8:00 PM, Daniel Ezra Johnson <
> danielezrajohn...@gmail.com> wrote:
>  it also depends on the balance of the levels of SyntaxType in the data.
> Well not quite. The idea of testing the "middle level" is right, and
> whatever this means doesn't change if the balance of data changes across
> levels...
> But you could have two data sets where the effects of listener for each
> level of SyntaxType are the same (between the two data sets), but the
> significance of this test changes…
> On Mon, Sep 19, 2016 at 2:45 PM, Levy, Roger <rl...@ucsd.edu> wrote:
>> Hi Rachel,
>> If your goal is to test the main effect of Listener in the presence of
>> the Listener-SyntaxType interaction, as would typically be done in
>> traditional ANOVA analyses, I recommend you read this brief paper I wrote a
>> few years ago on how to do this:
>>   http://arxiv.org/abs/1405.2094
>> It is exactly targeted at this problem, and explains why you’re getting
>> the behavior you report due to differences in how R treats factors versus
>> numeric variables in formulae.  (Setting the contrasts on the factor has no
>> impact.)
>> I have no explanation for your reported behavior of why you don’t get
>> this problem when you test for the main effect of SyntaxType; if you give
>> further details, we might be able to help further!
>> Best
>> Roger
>> On Sep 18, 2016, at 5:57 PM, Rachel Ostrand <rostr...@cogsci.ucsd.edu>
>> wrote:
>> Hi everyone,
>> I'm having trouble with some 2-factor glmer models that I'm trying to
>> run, such that the model with one of the main effects removed is coming out
>> identical to the full model. Some colleagues suggested that this might be
>> due to the coding of my factors, specifically because I have a factor that
>> has 3 levels, and that one needs to be treated differently, but I'm not
>> sure how - or why - to do that.
>> Brief summary of my data:
>> -My DV (called Target_E2_pref) is a binary categorical variable.
>> -There are two categorical IVs: Listener (2 levels) and SyntaxType (3
>> levels).
>> -Listener varies by both subject and item (i.e., picture); SyntaxType
>> only varies by subject.
>> When I dummy coded my variables using contr.treatment(), the model with
>> the main effect of Listener removed from the fixed effects comes out
>> identical to the full model:
>> SoleTrain = read.table(paste(path, "SoleTrain.dat", sep=""), header=T)
>> SoleTrain$Listener.f = factor(SoleTrain$Listener, labels=c("E1", "E2"))
>> contrasts(SoleTrain$Listener.f) = contr.treatment(2)
>> SoleTrain$SyntaxType.f = factor(SoleTrain$SyntaxType,
>> labels=c("Transitive", "Locative", "Dative"))
>> contrasts(SoleTrain$SyntaxType.f) = contr.treatment(3)
>> # Create full model:
>> SoleTrain.full<- glmer(Target_E2_pref ~ Listener.f*SyntaxType.f + (1 +
>> Listener.f*SyntaxType.f|Subject) + (1 + Listener.f|Picture), data =
>> SoleTrain, family = binomial, verbose=T,
>> control=glmerControl(optCtrl=list(maxfun=20000)))
>> # Create model with main effect of Listener removed:
>> SoleTrain.noListener<- glmer(Target_E2_pref ~ SyntaxType.f +
>> Listener.f:SyntaxType.f + (1 + Listener.f*SyntaxType.f|Subject) + (1 +
>> Listener.f|Picture), data = SoleTrain, family = binomial, verbose=T,
>> control=glmerControl(optCtrl=list(maxfun=20000)))
>> > anova(SoleTrain.full, SoleTrain.noListener)
>> Data: SoleTrain
>> Models:
>> SoleTrain.full: Target_E2_pref ~ Listener.f * SyntaxType.f + (1 +
>> Listener.f * SyntaxType.f | Subject) + (1 + Listener.f | Picture)
>> SoleTrain.noListener: Target_E2_pref ~ SyntaxType.f +
>> Listener.f:SyntaxType.f + (1 + Listener.f * SyntaxType.f | Subject) + (1 +
>> Listener.f | Picture)
>>                      Df    AIC    BIC  logLik deviance Chisq Chi Df
>> Pr(>Chisq)
>> SoleTrain.full       30 2732.5 2908.5 -1336.2   2672.5
>> SoleTrain.noListener 30 2732.5 2908.5 -1336.2   2672.5     0      0
>>    1
>> However, I don't have this problem when I test for the main effect of
>> SyntaxType, and remove the SyntaxType.f factor from the fixed effects.
>> (That is, this produces a different model than the full model.)
>> Someone suggested that Helmert coding was better for factors with more
>> than two levels, so I tried running the same models except with Helmert
>> coding [contrasts(SoleTrain$SyntaxType.f) = contr.helmert(3)], but the
>> models come out identical to the way they do with dummy coding. So why does 
>> the
>> model with the main effect of Listener removed the same as the model with
>> the main effect of Listener retained?
>> Any suggestions as to what I'm doing wrong?
>> Thanks!
>> Rachel

Reply via email to