Guys,

just a quick note, in case it's not apparent to everyone (I had emailed
this earlier to Rachel): what happens in Rachel's model is simply that R
defaults to simple effects coding when a 'main' effect is removed while the
interaction is still included (note that this, I think, overrides whatever
contrasts you have specified for the factor you remove). That's actually a
very useful default. To me, the thing that was puzzling at first is the
same thing that Roger commented on: it should be just the same when you
remove a two-way or a three-way factor. indeed, when i tried to replicate
Rachel's problem, I did/do get the same (simple effects reparameterization)
regardless of how many levels the factor that I remove has.

Florian

On Tue, Sep 20, 2016 at 2:41 PM Wednesday Bushong <
wednesday.bush...@gmail.com> wrote:

> Let me also say something w.r.t. coding because I think you also expressed
> doubt about what kind of coding scheme to use.
>
> The crucial thing to remember is that when interpreting coefficients from
> R model summary outputs, a coefficient is interpreted as moving from a
> value of 0 to 1 on that particular variable, when the values of the other
> variables are set to 0.
>
> In the case of dummy coding, then, the "main effect" of Listener is
> actually the difference in logodds going from the first level of listener
> to the second level of listener *when the two SyntaxType dummy variables
> are at 0* -- that is, when SyntaxType is at the first level. So this is
> really just a pairwise comparison between two groups, and doesn't have
> anything to say about the average effect of Listener across the SyntaxType
> groups. In order to get the interpretation of Listener to be across the
> average of all SyntaxType groups, you would have to contrast code
> SyntaxType (b/c then 0 will be the avg of all the levels). Similar
> interpretations in a fully dummy-coded model go for the other main effect
> terms (i.e., each SyntaxType effect is interpreted w.r.t the reference
> level of Listener) and the interaction terms (Listener:SyntaxType will be
> the effect of listener at the other SyntaxType levels; notice that this
> isn't even close to what we would normally conceptualize as an
> "interaction"! So be careful with coding!).
>
> Of course, you can mix and match your coding schemes -- for instance, if
> you want to get the main effect of Listener at the avg. of SyntaxType but
> wanted pairwise comparisons of SyntaxType within one particular Listener
> group, you could contrast code SyntaxType and dummy code Listener
> appropriately -- but in general, the most common thing to do will be
> contrast coding all factors, which will give you the standard ANOVA output
> interpretation.
>
> -Wed
>
> On Tue, Sep 20, 2016 at 1:58 PM Wednesday Bushong <
> wednesday.bush...@gmail.com> wrote:
>
>> Hi Rachel,
>>
>> I think at times like this it's useful to look at exactly how R assigns
>> factors. When you add interactions, R does a lot of behind-the-scenes work
>> that isn't immediately apparent. One way to look into this in more detail
>> is this really nice function "model.matrix", which given a data frame and a
>> model formula, will show you all of the coding variables that are created
>> in order to fit the model and what their values are for each combination of
>> factors in the dataset. I've bolded this below.
>>
>> # create data frame w/ each factor level combo
>> d <- data.frame(Listener.f = rep(c("Listener1", "Listener2"), 3),
>> SyntaxType.f = c(rep("Syntax1", 2), rep("Syntax2", 2), rep("Syntax3", 2)),
>> Target_E2_pref = rnorm(6))
>> # make factor
>> d$Listener.f <- factor(d$Listener.f)
>> d$SyntaxType.f <- factor(d$SyntaxType.f)
>>
>> *# create model formulas corresponding to full and reduced model*
>> *mod.formula <- formula(~ 1 + Listener.f * SyntaxType.f, d)*
>>
>> *mod.formula.reduced <- formula(~ 1 +
>> SyntaxType.f + Listener.f:SyntaxType.f, d)*
>> *# get var assignments for all factor level combos*
>> *mod.matrix <- model.matrix(mod.formula, d)*
>> *mod.matrix.reduced <- model.matrix(mod.formula.reduced, d)*
>>
>> If you look at mod.matrix and mod.matrix.reduced, you'll see that they
>> each have the same dimensionality. Digging in further, we can see why this
>> is. Let's look at the column names of each model matrix:
>>
>> colnames(mod.matrix)
>> *[2] "Listener.fListener2"*
>> [3] "SyntaxType.fSyntax2"
>> [4] "SyntaxType.fSyntax3"
>> [5] "Listener.fListener2:SyntaxType.fSyntax2"
>> [6] "Listener.fListener2:SyntaxType.fSyntax3"
>>
>> colnames(mod.matrix.reduced)
>> [1] "(Intercept)"
>> [2] "SyntaxType.fSyntax2"
>> [3] "SyntaxType.fSyntax3"
>> *[4] "SyntaxType.fSyntax1:Listener.fListener2"*
>> [5] "SyntaxType.fSyntax2:Listener.fListener2"
>> [6] "SyntaxType.fSyntax3:Listener.fListener2"
>>
>> I've bolded the differences. Now don't ask me why, but the way that R
>> *appears* to handle subtracting a main effect from a model but keeping
>> the interaction is to add in another interaction dummy variable that makes
>> the model equivalent. (If you look at the values that each factor combo
>> takes on, you'll see that this particular dummy variable is 1 when Listener
>> = Listener2 and SyntaxType = Syntax1, and 0 otherwise).
>>
>> The way to solve this is presented in Roger's paper he linked above (pg.
>> 4 being the most relevant here). His particular example is for contrast
>> coding but you can make it work in the exact same way with dummy coding (*but
>> make sure that dummy coding is what you really want to use given the
>> specific hypothesis you're testing!*):
>>
>> # make numeric versions of factors
>> d$Listener.numeric <- sapply(d$Listener.f,function(i)
>> contr.treatment(2)[i,]) # can easily replace w/ whatever coding scheme you
>> want
>> d$Syntax1.numeric <- sapply(d$SyntaxType.f,function(i)
>> contr.treatment(3)[i,])[1, ]
>> d$Syntax2.numeric <- sapply(d$SyntaxType.f,function(i)
>> contr.treatment(3)[i,])[2, ]
>>
>> # check model matrix
>> mod.formula.new <- formula(~ 1 + Syntax1.numeric + Syntax2.numeric +
>> Listener.numeric:Syntax1.numeric + Listener.numeric:Syntax2.numeric, d)
>> mod.matrix.new <- model.matrix(mod.formula.new, d)
>> colnames(mod.matrix.new)
>>
>> [1] "(Intercept)"
>> [2] "Syntax1.numeric"
>> [3] "Syntax2.numeric"
>> [4] "Syntax1.numeric:Listener.numeric"
>> [5] "Syntax2.numeric:Listener.numeric"
>>
>> Now things are as they should be: no more mysterious extra dummy variable
>> containing information about the main effect of Listener! This last model
>> is what you should compare your original to get the significance of the
>> main effect of Listener.
>>
>> Hope this was helpful!
>>
>> Best,
>> Wednesday
>>
>> On Tue, Sep 20, 2016 at 12:26 PM Levy, Roger <rl...@ucsd.edu> wrote:
>>
>>> Hi Dan,
>>>
>>> I’m having a bit of trouble figuring out exactly how your two comments
>>> comport with one another, but I think the crucial point here is that the
>>> procedure I outline in the paper is simply how to do exactly what is done
>>> in traditional ANOVA analyses.  In this approach, the expected effect size
>>> of, for example, ListenerType does not depend on the relative amounts of
>>> data in the various levels of SyntaxType (which is what I think you’re
>>> referring to by “balance of the levels”).
>>>
>>> Your caveat regarding whether the main effect of a factor X1 necessarily
>>> has a sensible interpretation in the presence of the interaction between X1
>>> and X2 is certainly appropriate.  In the beginning of the paper I have a
>>> few remarks on the caution that should be applied.  I do think that for
>>> factorial ANOVA analyses the main effect can often have a useful
>>> interpretation as the “across-the-board” effect that X1 has, regardless of
>>> the value of X2 (which once again is the traditional ANOVA interpretation
>>> of a main effect).
>>>
>>> If my responses to your comments aren’t on target, I would be very glad
>>> for clarification!
>>>
>>> Best
>>>
>>> Roger
>>>
>>>
>>> On Sep 19, 2016, at 7:31 PM, Daniel Ezra Johnson <
>>> danielezrajohn...@gmail.com> wrote:
>>>
>>> If you follow this procedure, though, you'd be testing for the effect of
>>> Listener when SyntaxType is "in the middle" (unweighted) of the three
>>> levels. This quantity not only has no sensible interpretation, it also
>>> depends on the balance of the levels of SyntaxType in the data.
>>>
>>> If the effect of Listener is in the same direction for each level of
>>> SyntaxType, such a test might be useful, but otherwise I don't think it
>>> would be?
>>>
>>> Dan
>>>
>>>
>>> On Sep 19, 2016, at 8:00 PM, Daniel Ezra Johnson <
>>> danielezrajohn...@gmail.com> wrote:
>>>
>>>
>>>
>>>  it also depends on the balance of the levels of SyntaxType in the data.
>>>
>>> Well not quite. The idea of testing the "middle level" is right, and
>>> whatever this means doesn't change if the balance of data changes across
>>> levels...
>>>
>>> But you could have two data sets where the effects of listener for each
>>> level of SyntaxType are the same (between the two data sets), but the
>>> significance of this test changes…
>>>
>>>
>>>
>>> On Mon, Sep 19, 2016 at 2:45 PM, Levy, Roger <rl...@ucsd.edu> wrote:
>>>
>>>> Hi Rachel,
>>>>
>>>> If your goal is to test the main effect of Listener in the presence of
>>>> the Listener-SyntaxType interaction, as would typically be done in
>>>> traditional ANOVA analyses, I recommend you read this brief paper I wrote a
>>>> few years ago on how to do this:
>>>>
>>>>   http://arxiv.org/abs/1405.2094
>>>>
>>>> It is exactly targeted at this problem, and explains why you’re getting
>>>> the behavior you report due to differences in how R treats factors versus
>>>> numeric variables in formulae.  (Setting the contrasts on the factor has no
>>>> impact.)
>>>>
>>>> I have no explanation for your reported behavior of why you don’t get
>>>> this problem when you test for the main effect of SyntaxType; if you give
>>>> further details, we might be able to help further!
>>>>
>>>> Best
>>>>
>>>> Roger
>>>>
>>>> On Sep 18, 2016, at 5:57 PM, Rachel Ostrand <rostr...@cogsci.ucsd.edu>
>>>> wrote:
>>>>
>>>> Hi everyone,
>>>>
>>>> I'm having trouble with some 2-factor glmer models that I'm trying to
>>>> run, such that the model with one of the main effects removed is coming out
>>>> identical to the full model. Some colleagues suggested that this might be
>>>> due to the coding of my factors, specifically because I have a factor that
>>>> has 3 levels, and that one needs to be treated differently, but I'm not
>>>> sure how - or why - to do that.
>>>>
>>>> Brief summary of my data:
>>>> -My DV (called Target_E2_pref) is a binary categorical variable.
>>>> -There are two categorical IVs: Listener (2 levels) and SyntaxType (3
>>>> levels).
>>>> -Listener varies by both subject and item (i.e., picture); SyntaxType
>>>> only varies by subject.
>>>>
>>>> When I dummy coded my variables using contr.treatment(), the model with
>>>> the main effect of Listener removed from the fixed effects comes out
>>>> identical to the full model:
>>>>
>>>> SoleTrain = read.table(paste(path, "SoleTrain.dat", sep=""), header=T)
>>>> SoleTrain$Listener.f = factor(SoleTrain$Listener, labels=c("E1", "E2"))
>>>> contrasts(SoleTrain$Listener.f) = contr.treatment(2)
>>>> SoleTrain$SyntaxType.f = factor(SoleTrain$SyntaxType,
>>>> labels=c("Transitive", "Locative", "Dative"))
>>>> contrasts(SoleTrain$SyntaxType.f) = contr.treatment(3)
>>>>
>>>> # Create full model:
>>>> SoleTrain.full<- glmer(Target_E2_pref ~ Listener.f*SyntaxType.f + (1 +
>>>> Listener.f*SyntaxType.f|Subject) + (1 + Listener.f|Picture), data =
>>>> SoleTrain, family = binomial, verbose=T,
>>>> control=glmerControl(optCtrl=list(maxfun=20000)))
>>>>
>>>> # Create model with main effect of Listener removed:
>>>> SoleTrain.noListener<- glmer(Target_E2_pref ~ SyntaxType.f +
>>>> Listener.f:SyntaxType.f + (1 + Listener.f*SyntaxType.f|Subject) + (1 +
>>>> Listener.f|Picture), data = SoleTrain, family = binomial, verbose=T,
>>>> control=glmerControl(optCtrl=list(maxfun=20000)))
>>>>
>>>> > anova(SoleTrain.full, SoleTrain.noListener)
>>>> Data: SoleTrain
>>>> Models:
>>>> SoleTrain.full: Target_E2_pref ~ Listener.f * SyntaxType.f + (1 +
>>>> Listener.f * SyntaxType.f | Subject) + (1 + Listener.f | Picture)
>>>> SoleTrain.noListener: Target_E2_pref ~ SyntaxType.f +
>>>> Listener.f:SyntaxType.f + (1 + Listener.f * SyntaxType.f | Subject) + (1 +
>>>> Listener.f | Picture)
>>>>                      Df    AIC    BIC  logLik deviance Chisq Chi Df
>>>> Pr(>Chisq)
>>>> SoleTrain.full       30 2732.5 2908.5 -1336.2   2672.5
>>>>
>>>> SoleTrain.noListener 30 2732.5 2908.5 -1336.2   2672.5     0      0
>>>>      1
>>>>
>>>> However, I don't have this problem when I test for the main effect of
>>>> SyntaxType, and remove the SyntaxType.f factor from the fixed effects.
>>>> (That is, this produces a different model than the full model.)
>>>>
>>>> Someone suggested that Helmert coding was better for factors with more
>>>> than two levels, so I tried running the same models except with Helmert
>>>> coding [contrasts(SoleTrain$SyntaxType.f) = contr.helmert(3)], but the
>>>> models come out identical to the way they do with dummy coding. So why 
>>>> does the
>>>> model with the main effect of Listener removed the same as the model with
>>>> the main effect of Listener retained?
>>>>
>>>> Any suggestions as to what I'm doing wrong?
>>>>
>>>> Thanks!
>>>> Rachel
>>>>
>>>>
>>>>
>>>

Reply via email to