You know, I'm not quite able to describe the null hypothesis here correctly. 
But as Florian originally said, it's something that would rarely if ever make 
sense to test. 

> On Sep 20, 2016, at 1:08 PM, Daniel Ezra Johnson 
> <danielezrajohn...@gmail.com> wrote:
> 
> Plus a constant main effect of X. Still ludicrous. 
> 
>> On Sep 20, 2016, at 1:02 PM, Daniel Ezra Johnson 
>> <danielezrajohn...@gmail.com> wrote:
>> 
>> I mean, the effect of X averaged across the levels of Y. Why would that be 
>> zero?
>> 
>>> On Sep 20, 2016, at 1:00 PM, Daniel Ezra Johnson 
>>> <danielezrajohn...@gmail.com> wrote:
>>> 
>>> Say X is numeric. Y is a factor. You're testing the hypothesis that the 
>>> unweighted average of the effects of the levels of Y equals zero. This is a 
>>> pretty ludicrous null hypothesis. 
>>> 
>>>> On Sep 20, 2016, at 12:27 PM, T. Florian Jaeger <timegu...@gmail.com> 
>>>> wrote:
>>>> 
>>>> Guys,
>>>> 
>>>> just a quick note, in case it's not apparent to everyone (I had emailed 
>>>> this earlier to Rachel): what happens in Rachel's model is simply that R 
>>>> defaults to simple effects coding when a 'main' effect is removed while 
>>>> the interaction is still included (note that this, I think, overrides 
>>>> whatever contrasts you have specified for the factor you remove). That's 
>>>> actually a very useful default. To me, the thing that was puzzling at 
>>>> first is the same thing that Roger commented on: it should be just the 
>>>> same when you remove a two-way or a three-way factor. indeed, when i tried 
>>>> to replicate Rachel's problem, I did/do get the same (simple effects 
>>>> reparameterization) regardless of how many levels the factor that I remove 
>>>> has.
>>>> 
>>>> Florian
>>>> 
>>>>> On Tue, Sep 20, 2016 at 2:41 PM Wednesday Bushong 
>>>>> <wednesday.bush...@gmail.com> wrote:
>>>>> Let me also say something w.r.t. coding because I think you also 
>>>>> expressed doubt about what kind of coding scheme to use.
>>>>> 
>>>>> The crucial thing to remember is that when interpreting coefficients from 
>>>>> R model summary outputs, a coefficient is interpreted as moving from a 
>>>>> value of 0 to 1 on that particular variable, when the values of the other 
>>>>> variables are set to 0.
>>>>> 
>>>>> In the case of dummy coding, then, the "main effect" of Listener is 
>>>>> actually the difference in logodds going from the first level of listener 
>>>>> to the second level of listener when the two SyntaxType dummy variables 
>>>>> are at 0 -- that is, when SyntaxType is at the first level. So this is 
>>>>> really just a pairwise comparison between two groups, and doesn't have 
>>>>> anything to say about the average effect of Listener across the 
>>>>> SyntaxType groups. In order to get the interpretation of Listener to be 
>>>>> across the average of all SyntaxType groups, you would have to contrast 
>>>>> code SyntaxType (b/c then 0 will be the avg of all the levels). Similar 
>>>>> interpretations in a fully dummy-coded model go for the other main effect 
>>>>> terms (i.e., each SyntaxType effect is interpreted w.r.t the reference 
>>>>> level of Listener) and the interaction terms (Listener:SyntaxType will be 
>>>>> the effect of listener at the other SyntaxType levels; notice that this 
>>>>> isn't even close to what we would normally conceptualize as an 
>>>>> "interaction"! So be careful with coding!).
>>>>> 
>>>>> Of course, you can mix and match your coding schemes -- for instance, if 
>>>>> you want to get the main effect of Listener at the avg. of SyntaxType but 
>>>>> wanted pairwise comparisons of SyntaxType within one particular Listener 
>>>>> group, you could contrast code SyntaxType and dummy code Listener 
>>>>> appropriately -- but in general, the most common thing to do will be 
>>>>> contrast coding all factors, which will give you the standard ANOVA 
>>>>> output interpretation.
>>>>> 
>>>>> -Wed
>>>>> 
>>>>>> On Tue, Sep 20, 2016 at 1:58 PM Wednesday Bushong 
>>>>>> <wednesday.bush...@gmail.com> wrote:
>>>>>> Hi Rachel,
>>>>>> 
>>>>>> I think at times like this it's useful to look at exactly how R assigns 
>>>>>> factors. When you add interactions, R does a lot of behind-the-scenes 
>>>>>> work that isn't immediately apparent. One way to look into this in more 
>>>>>> detail is this really nice function "model.matrix", which given a data 
>>>>>> frame and a model formula, will show you all of the coding variables 
>>>>>> that are created in order to fit the model and what their values are for 
>>>>>> each combination of factors in the dataset. I've bolded this below.
>>>>>> 
>>>>>> # create data frame w/ each factor level combo
>>>>>> d <- data.frame(Listener.f = rep(c("Listener1", "Listener2"), 3),
>>>>>>  SyntaxType.f = c(rep("Syntax1", 2), rep("Syntax2", 2), rep("Syntax3", 
>>>>>> 2)),
>>>>>>  Target_E2_pref = rnorm(6))
>>>>>> # make factor
>>>>>> d$Listener.f <- factor(d$Listener.f)
>>>>>> d$SyntaxType.f <- factor(d$SyntaxType.f)
>>>>>> 
>>>>>> # create model formulas corresponding to full and reduced model
>>>>>> mod.formula <- formula(~ 1 + Listener.f * SyntaxType.f, d)
>>>>>> mod.formula.reduced <- formula(~ 1 + SyntaxType.f + 
>>>>>> Listener.f:SyntaxType.f, d)
>>>>>> # get var assignments for all factor level combos
>>>>>> mod.matrix <- model.matrix(mod.formula, d)
>>>>>> mod.matrix.reduced <- model.matrix(mod.formula.reduced, d)
>>>>>> 
>>>>>> If you look at mod.matrix and mod.matrix.reduced, you'll see that they 
>>>>>> each have the same dimensionality. Digging in further, we can see why 
>>>>>> this is. Let's look at the column names of each model matrix:
>>>>>> 
>>>>>> colnames(mod.matrix)
>>>>>> [2] "Listener.fListener2"
>>>>>> [3] "SyntaxType.fSyntax2"
>>>>>> [4] "SyntaxType.fSyntax3"
>>>>>> [5] "Listener.fListener2:SyntaxType.fSyntax2"
>>>>>> [6] "Listener.fListener2:SyntaxType.fSyntax3"
>>>>>> 
>>>>>> colnames(mod.matrix.reduced)
>>>>>> [1] "(Intercept)"
>>>>>> [2] "SyntaxType.fSyntax2"
>>>>>> [3] "SyntaxType.fSyntax3"
>>>>>> [4] "SyntaxType.fSyntax1:Listener.fListener2"
>>>>>> [5] "SyntaxType.fSyntax2:Listener.fListener2"
>>>>>> [6] "SyntaxType.fSyntax3:Listener.fListener2"
>>>>>> 
>>>>>> I've bolded the differences. Now don't ask me why, but the way that R 
>>>>>> appears to handle subtracting a main effect from a model but keeping the 
>>>>>> interaction is to add in another interaction dummy variable that makes 
>>>>>> the model equivalent. (If you look at the values that each factor combo 
>>>>>> takes on, you'll see that this particular dummy variable is 1 when 
>>>>>> Listener = Listener2 and SyntaxType = Syntax1, and 0 otherwise).
>>>>>> 
>>>>>> The way to solve this is presented in Roger's paper he linked above (pg. 
>>>>>> 4 being the most relevant here). His particular example is for contrast 
>>>>>> coding but you can make it work in the exact same way with dummy coding 
>>>>>> (but make sure that dummy coding is what you really want to use given 
>>>>>> the specific hypothesis you're testing!):
>>>>>> 
>>>>>> # make numeric versions of factors
>>>>>> d$Listener.numeric <- sapply(d$Listener.f,function(i) 
>>>>>> contr.treatment(2)[i,]) # can easily replace w/ whatever coding scheme 
>>>>>> you want
>>>>>> d$Syntax1.numeric <- sapply(d$SyntaxType.f,function(i) 
>>>>>> contr.treatment(3)[i,])[1, ]
>>>>>> d$Syntax2.numeric <- sapply(d$SyntaxType.f,function(i) 
>>>>>> contr.treatment(3)[i,])[2, ]
>>>>>> 
>>>>>> # check model matrix
>>>>>> mod.formula.new <- formula(~ 1 + Syntax1.numeric + Syntax2.numeric + 
>>>>>> Listener.numeric:Syntax1.numeric + Listener.numeric:Syntax2.numeric, d)
>>>>>> mod.matrix.new <- model.matrix(mod.formula.new, d)
>>>>>> colnames(mod.matrix.new)
>>>>>> 
>>>>>> [1] "(Intercept)"                     
>>>>>> [2] "Syntax1.numeric"
>>>>>> [3] "Syntax2.numeric"                  
>>>>>> [4] "Syntax1.numeric:Listener.numeric"
>>>>>> [5] "Syntax2.numeric:Listener.numeric"
>>>>>> 
>>>>>> Now things are as they should be: no more mysterious extra dummy 
>>>>>> variable containing information about the main effect of Listener! This 
>>>>>> last model is what you should compare your original to get the 
>>>>>> significance of the main effect of Listener. 
>>>>>> 
>>>>>> Hope this was helpful!
>>>>>> 
>>>>>> Best,
>>>>>> Wednesday
>>>>>> 
>>>>>>> On Tue, Sep 20, 2016 at 12:26 PM Levy, Roger <rl...@ucsd.edu> wrote:
>>>>>>> Hi Dan,
>>>>>>> 
>>>>>>> I’m having a bit of trouble figuring out exactly how your two comments 
>>>>>>> comport with one another, but I think the crucial point here is that 
>>>>>>> the procedure I outline in the paper is simply how to do exactly what 
>>>>>>> is done in traditional ANOVA analyses.  In this approach, the expected 
>>>>>>> effect size of, for example, ListenerType does not depend on the 
>>>>>>> relative amounts of data in the various levels of SyntaxType (which is 
>>>>>>> what I think you’re referring to by “balance of the levels”).
>>>>>>> 
>>>>>>> Your caveat regarding whether the main effect of a factor X1 
>>>>>>> necessarily has a sensible interpretation in the presence of the 
>>>>>>> interaction between X1 and X2 is certainly appropriate.  In the 
>>>>>>> beginning of the paper I have a few remarks on the caution that should 
>>>>>>> be applied.  I do think that for factorial ANOVA analyses the main 
>>>>>>> effect can often have a useful interpretation as the “across-the-board” 
>>>>>>> effect that X1 has, regardless of the value of X2 (which once again is 
>>>>>>> the traditional ANOVA interpretation  of a main effect).
>>>>>>> 
>>>>>>> If my responses to your comments aren’t on target, I would be very glad 
>>>>>>> for clarification!
>>>>>>> 
>>>>>>> Best
>>>>>>> 
>>>>>>> Roger
>>>>>>> 
>>>>>>> 
>>>>>>>> On Sep 19, 2016, at 7:31 PM, Daniel Ezra Johnson 
>>>>>>>> <danielezrajohn...@gmail.com> wrote:
>>>>>>>> 
>>>>>>>> If you follow this procedure, though, you'd be testing for the effect 
>>>>>>>> of Listener when SyntaxType is "in the middle" (unweighted) of the 
>>>>>>>> three levels. This quantity not only has no sensible interpretation, 
>>>>>>>> it also depends on the balance of the levels of SyntaxType in the data.
>>>>>>>> 
>>>>>>>> If the effect of Listener is in the same direction for each level of 
>>>>>>>> SyntaxType, such a test might be useful, but otherwise I don't think 
>>>>>>>> it would be?
>>>>>>>> 
>>>>>>>> Dan 
>>>>>>> 
>>>>>>>> On Sep 19, 2016, at 8:00 PM, Daniel Ezra Johnson 
>>>>>>>> <danielezrajohn...@gmail.com> wrote:
>>>>>>>> 
>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>>  it also depends on the balance of the levels of SyntaxType in the 
>>>>>>>> data.
>>>>>>>> 
>>>>>>> 
>>>>>>>> Well not quite. The idea of testing the "middle level" is right, and 
>>>>>>>> whatever this means doesn't change if the balance of data changes 
>>>>>>>> across levels...
>>>>>>>> 
>>>>>>> 
>>>>>>>> But you could have two data sets where the effects of listener for 
>>>>>>>> each level of SyntaxType are the same (between the two data sets), but 
>>>>>>>> the significance of this test changes…
>>>>>>> 
>>>>>>> 
>>>>>>>> 
>>>>>>>>> On Mon, Sep 19, 2016 at 2:45 PM, Levy, Roger <rl...@ucsd.edu> wrote:
>>>>>>>>> Hi Rachel,
>>>>>>>>> 
>>>>>>>>> If your goal is to test the main effect of Listener in the presence 
>>>>>>>>> of the Listener-SyntaxType interaction, as would typically be done in 
>>>>>>>>> traditional ANOVA analyses, I recommend you read this brief paper I 
>>>>>>>>> wrote a few years ago on how to do this:
>>>>>>>>> 
>>>>>>>>>   http://arxiv.org/abs/1405.2094
>>>>>>>>> 
>>>>>>>>> It is exactly targeted at this problem, and explains why you’re 
>>>>>>>>> getting the behavior you report due to differences in how R treats 
>>>>>>>>> factors versus numeric variables in formulae.  (Setting the contrasts 
>>>>>>>>> on the factor has no impact.)
>>>>>>>>> 
>>>>>>>>> I have no explanation for your reported behavior of why you don’t get 
>>>>>>>>> this problem when you test for the main effect of SyntaxType; if you 
>>>>>>>>> give further details, we might be able to help further!
>>>>>>>>> 
>>>>>>>>> Best
>>>>>>>>> 
>>>>>>>>> Roger
>>>>>>>>> 
>>>>>>>>>> On Sep 18, 2016, at 5:57 PM, Rachel Ostrand 
>>>>>>>>>> <rostr...@cogsci.ucsd.edu> wrote:
>>>>>>>>>> 
>>>>>>>>>> Hi everyone,
>>>>>>>>>> 
>>>>>>>>>> I'm having trouble with some 2-factor glmer models that I'm trying 
>>>>>>>>>> to run, such that the model with one of the main effects removed is 
>>>>>>>>>> coming out identical to the full model. Some colleagues suggested 
>>>>>>>>>> that this might be due to the coding of my factors, specifically 
>>>>>>>>>> because I have a factor that has 3 levels, and that one needs to be 
>>>>>>>>>> treated differently, but I'm not sure how - or why - to do that.
>>>>>>>>>> 
>>>>>>>>>> Brief summary of my data:
>>>>>>>>>> -My DV (called Target_E2_pref) is a binary categorical variable.
>>>>>>>>>> -There are two categorical IVs: Listener (2 levels) and SyntaxType 
>>>>>>>>>> (3 levels).
>>>>>>>>>> -Listener varies by both subject and item (i.e., picture); 
>>>>>>>>>> SyntaxType only varies by subject.
>>>>>>>>>> 
>>>>>>>>>> When I dummy coded my variables using contr.treatment(), the model 
>>>>>>>>>> with the main effect of Listener removed from the fixed effects 
>>>>>>>>>> comes out identical to the full model:
>>>>>>>>>> 
>>>>>>>>>> SoleTrain = read.table(paste(path, "SoleTrain.dat", sep=""), 
>>>>>>>>>> header=T)
>>>>>>>>>> SoleTrain$Listener.f = factor(SoleTrain$Listener, labels=c("E1", 
>>>>>>>>>> "E2"))
>>>>>>>>>> contrasts(SoleTrain$Listener.f) = contr.treatment(2)
>>>>>>>>>> SoleTrain$SyntaxType.f = factor(SoleTrain$SyntaxType, 
>>>>>>>>>> labels=c("Transitive", "Locative", "Dative"))
>>>>>>>>>> contrasts(SoleTrain$SyntaxType.f) = contr.treatment(3)
>>>>>>>>>> 
>>>>>>>>>> # Create full model:
>>>>>>>>>> SoleTrain.full<- glmer(Target_E2_pref ~ Listener.f*SyntaxType.f + (1 
>>>>>>>>>> + Listener.f*SyntaxType.f|Subject) + (1 + Listener.f|Picture), data 
>>>>>>>>>> = SoleTrain, family = binomial, verbose=T, 
>>>>>>>>>> control=glmerControl(optCtrl=list(maxfun=20000)))
>>>>>>>>>> 
>>>>>>>>>> # Create model with main effect of Listener removed:
>>>>>>>>>> SoleTrain.noListener<- glmer(Target_E2_pref ~ SyntaxType.f + 
>>>>>>>>>> Listener.f:SyntaxType.f + (1 + Listener.f*SyntaxType.f|Subject) + (1 
>>>>>>>>>> + Listener.f|Picture), data = SoleTrain, family = binomial, 
>>>>>>>>>> verbose=T, control=glmerControl(optCtrl=list(maxfun=20000)))
>>>>>>>>>> 
>>>>>>>>>> > anova(SoleTrain.full, SoleTrain.noListener)
>>>>>>>>>> Data: SoleTrain
>>>>>>>>>> Models:
>>>>>>>>>> SoleTrain.full: Target_E2_pref ~ Listener.f * SyntaxType.f + (1 + 
>>>>>>>>>> Listener.f * SyntaxType.f | Subject) + (1 + Listener.f | Picture)
>>>>>>>>>> SoleTrain.noListener: Target_E2_pref ~ SyntaxType.f + 
>>>>>>>>>> Listener.f:SyntaxType.f + (1 + Listener.f * SyntaxType.f | Subject) 
>>>>>>>>>> + (1 + Listener.f | Picture)
>>>>>>>>>>                      Df    AIC    BIC  logLik deviance Chisq Chi Df 
>>>>>>>>>> Pr(>Chisq)
>>>>>>>>>> SoleTrain.full       30 2732.5 2908.5 -1336.2   2672.5               
>>>>>>>>>>          
>>>>>>>>>> SoleTrain.noListener 30 2732.5 2908.5 -1336.2   2672.5     0      0  
>>>>>>>>>>         1
>>>>>>>>>> 
>>>>>>>>>> However, I don't have this problem when I test for the main effect 
>>>>>>>>>> of SyntaxType, and remove the SyntaxType.f factor from the fixed 
>>>>>>>>>> effects. (That is, this produces a different model than the full 
>>>>>>>>>> model.)
>>>>>>>>>> 
>>>>>>>>>> Someone suggested that Helmert coding was better for factors with 
>>>>>>>>>> more than two levels, so I tried running the same models except with 
>>>>>>>>>> Helmert coding [contrasts(SoleTrain$SyntaxType.f) = 
>>>>>>>>>> contr.helmert(3)], but the models come out identical to the way they 
>>>>>>>>>> do with dummy coding. So why does the model with the main effect of 
>>>>>>>>>> Listener removed the same as the model with the main effect of 
>>>>>>>>>> Listener retained?
>>>>>>>>>> 
>>>>>>>>>> Any suggestions as to what I'm doing wrong?
>>>>>>>>>> 
>>>>>>>>>> Thanks!
>>>>>>>>>> Rachel
>>>>>>>>> 
>>>>>>>> 

Reply via email to