You know, I'm not quite able to describe the null hypothesis here correctly. But as Florian originally said, it's something that would rarely if ever make sense to test.

## Advertising

> On Sep 20, 2016, at 1:08 PM, Daniel Ezra Johnson > <danielezrajohn...@gmail.com> wrote: > > Plus a constant main effect of X. Still ludicrous. > >> On Sep 20, 2016, at 1:02 PM, Daniel Ezra Johnson >> <danielezrajohn...@gmail.com> wrote: >> >> I mean, the effect of X averaged across the levels of Y. Why would that be >> zero? >> >>> On Sep 20, 2016, at 1:00 PM, Daniel Ezra Johnson >>> <danielezrajohn...@gmail.com> wrote: >>> >>> Say X is numeric. Y is a factor. You're testing the hypothesis that the >>> unweighted average of the effects of the levels of Y equals zero. This is a >>> pretty ludicrous null hypothesis. >>> >>>> On Sep 20, 2016, at 12:27 PM, T. Florian Jaeger <timegu...@gmail.com> >>>> wrote: >>>> >>>> Guys, >>>> >>>> just a quick note, in case it's not apparent to everyone (I had emailed >>>> this earlier to Rachel): what happens in Rachel's model is simply that R >>>> defaults to simple effects coding when a 'main' effect is removed while >>>> the interaction is still included (note that this, I think, overrides >>>> whatever contrasts you have specified for the factor you remove). That's >>>> actually a very useful default. To me, the thing that was puzzling at >>>> first is the same thing that Roger commented on: it should be just the >>>> same when you remove a two-way or a three-way factor. indeed, when i tried >>>> to replicate Rachel's problem, I did/do get the same (simple effects >>>> reparameterization) regardless of how many levels the factor that I remove >>>> has. >>>> >>>> Florian >>>> >>>>> On Tue, Sep 20, 2016 at 2:41 PM Wednesday Bushong >>>>> <wednesday.bush...@gmail.com> wrote: >>>>> Let me also say something w.r.t. coding because I think you also >>>>> expressed doubt about what kind of coding scheme to use. >>>>> >>>>> The crucial thing to remember is that when interpreting coefficients from >>>>> R model summary outputs, a coefficient is interpreted as moving from a >>>>> value of 0 to 1 on that particular variable, when the values of the other >>>>> variables are set to 0. >>>>> >>>>> In the case of dummy coding, then, the "main effect" of Listener is >>>>> actually the difference in logodds going from the first level of listener >>>>> to the second level of listener when the two SyntaxType dummy variables >>>>> are at 0 -- that is, when SyntaxType is at the first level. So this is >>>>> really just a pairwise comparison between two groups, and doesn't have >>>>> anything to say about the average effect of Listener across the >>>>> SyntaxType groups. In order to get the interpretation of Listener to be >>>>> across the average of all SyntaxType groups, you would have to contrast >>>>> code SyntaxType (b/c then 0 will be the avg of all the levels). Similar >>>>> interpretations in a fully dummy-coded model go for the other main effect >>>>> terms (i.e., each SyntaxType effect is interpreted w.r.t the reference >>>>> level of Listener) and the interaction terms (Listener:SyntaxType will be >>>>> the effect of listener at the other SyntaxType levels; notice that this >>>>> isn't even close to what we would normally conceptualize as an >>>>> "interaction"! So be careful with coding!). >>>>> >>>>> Of course, you can mix and match your coding schemes -- for instance, if >>>>> you want to get the main effect of Listener at the avg. of SyntaxType but >>>>> wanted pairwise comparisons of SyntaxType within one particular Listener >>>>> group, you could contrast code SyntaxType and dummy code Listener >>>>> appropriately -- but in general, the most common thing to do will be >>>>> contrast coding all factors, which will give you the standard ANOVA >>>>> output interpretation. >>>>> >>>>> -Wed >>>>> >>>>>> On Tue, Sep 20, 2016 at 1:58 PM Wednesday Bushong >>>>>> <wednesday.bush...@gmail.com> wrote: >>>>>> Hi Rachel, >>>>>> >>>>>> I think at times like this it's useful to look at exactly how R assigns >>>>>> factors. When you add interactions, R does a lot of behind-the-scenes >>>>>> work that isn't immediately apparent. One way to look into this in more >>>>>> detail is this really nice function "model.matrix", which given a data >>>>>> frame and a model formula, will show you all of the coding variables >>>>>> that are created in order to fit the model and what their values are for >>>>>> each combination of factors in the dataset. I've bolded this below. >>>>>> >>>>>> # create data frame w/ each factor level combo >>>>>> d <- data.frame(Listener.f = rep(c("Listener1", "Listener2"), 3), >>>>>> SyntaxType.f = c(rep("Syntax1", 2), rep("Syntax2", 2), rep("Syntax3", >>>>>> 2)), >>>>>> Target_E2_pref = rnorm(6)) >>>>>> # make factor >>>>>> d$Listener.f <- factor(d$Listener.f) >>>>>> d$SyntaxType.f <- factor(d$SyntaxType.f) >>>>>> >>>>>> # create model formulas corresponding to full and reduced model >>>>>> mod.formula <- formula(~ 1 + Listener.f * SyntaxType.f, d) >>>>>> mod.formula.reduced <- formula(~ 1 + SyntaxType.f + >>>>>> Listener.f:SyntaxType.f, d) >>>>>> # get var assignments for all factor level combos >>>>>> mod.matrix <- model.matrix(mod.formula, d) >>>>>> mod.matrix.reduced <- model.matrix(mod.formula.reduced, d) >>>>>> >>>>>> If you look at mod.matrix and mod.matrix.reduced, you'll see that they >>>>>> each have the same dimensionality. Digging in further, we can see why >>>>>> this is. Let's look at the column names of each model matrix: >>>>>> >>>>>> colnames(mod.matrix) >>>>>> [2] "Listener.fListener2" >>>>>> [3] "SyntaxType.fSyntax2" >>>>>> [4] "SyntaxType.fSyntax3" >>>>>> [5] "Listener.fListener2:SyntaxType.fSyntax2" >>>>>> [6] "Listener.fListener2:SyntaxType.fSyntax3" >>>>>> >>>>>> colnames(mod.matrix.reduced) >>>>>> [1] "(Intercept)" >>>>>> [2] "SyntaxType.fSyntax2" >>>>>> [3] "SyntaxType.fSyntax3" >>>>>> [4] "SyntaxType.fSyntax1:Listener.fListener2" >>>>>> [5] "SyntaxType.fSyntax2:Listener.fListener2" >>>>>> [6] "SyntaxType.fSyntax3:Listener.fListener2" >>>>>> >>>>>> I've bolded the differences. Now don't ask me why, but the way that R >>>>>> appears to handle subtracting a main effect from a model but keeping the >>>>>> interaction is to add in another interaction dummy variable that makes >>>>>> the model equivalent. (If you look at the values that each factor combo >>>>>> takes on, you'll see that this particular dummy variable is 1 when >>>>>> Listener = Listener2 and SyntaxType = Syntax1, and 0 otherwise). >>>>>> >>>>>> The way to solve this is presented in Roger's paper he linked above (pg. >>>>>> 4 being the most relevant here). His particular example is for contrast >>>>>> coding but you can make it work in the exact same way with dummy coding >>>>>> (but make sure that dummy coding is what you really want to use given >>>>>> the specific hypothesis you're testing!): >>>>>> >>>>>> # make numeric versions of factors >>>>>> d$Listener.numeric <- sapply(d$Listener.f,function(i) >>>>>> contr.treatment(2)[i,]) # can easily replace w/ whatever coding scheme >>>>>> you want >>>>>> d$Syntax1.numeric <- sapply(d$SyntaxType.f,function(i) >>>>>> contr.treatment(3)[i,])[1, ] >>>>>> d$Syntax2.numeric <- sapply(d$SyntaxType.f,function(i) >>>>>> contr.treatment(3)[i,])[2, ] >>>>>> >>>>>> # check model matrix >>>>>> mod.formula.new <- formula(~ 1 + Syntax1.numeric + Syntax2.numeric + >>>>>> Listener.numeric:Syntax1.numeric + Listener.numeric:Syntax2.numeric, d) >>>>>> mod.matrix.new <- model.matrix(mod.formula.new, d) >>>>>> colnames(mod.matrix.new) >>>>>> >>>>>> [1] "(Intercept)" >>>>>> [2] "Syntax1.numeric" >>>>>> [3] "Syntax2.numeric" >>>>>> [4] "Syntax1.numeric:Listener.numeric" >>>>>> [5] "Syntax2.numeric:Listener.numeric" >>>>>> >>>>>> Now things are as they should be: no more mysterious extra dummy >>>>>> variable containing information about the main effect of Listener! This >>>>>> last model is what you should compare your original to get the >>>>>> significance of the main effect of Listener. >>>>>> >>>>>> Hope this was helpful! >>>>>> >>>>>> Best, >>>>>> Wednesday >>>>>> >>>>>>> On Tue, Sep 20, 2016 at 12:26 PM Levy, Roger <rl...@ucsd.edu> wrote: >>>>>>> Hi Dan, >>>>>>> >>>>>>> I’m having a bit of trouble figuring out exactly how your two comments >>>>>>> comport with one another, but I think the crucial point here is that >>>>>>> the procedure I outline in the paper is simply how to do exactly what >>>>>>> is done in traditional ANOVA analyses. In this approach, the expected >>>>>>> effect size of, for example, ListenerType does not depend on the >>>>>>> relative amounts of data in the various levels of SyntaxType (which is >>>>>>> what I think you’re referring to by “balance of the levels”). >>>>>>> >>>>>>> Your caveat regarding whether the main effect of a factor X1 >>>>>>> necessarily has a sensible interpretation in the presence of the >>>>>>> interaction between X1 and X2 is certainly appropriate. In the >>>>>>> beginning of the paper I have a few remarks on the caution that should >>>>>>> be applied. I do think that for factorial ANOVA analyses the main >>>>>>> effect can often have a useful interpretation as the “across-the-board” >>>>>>> effect that X1 has, regardless of the value of X2 (which once again is >>>>>>> the traditional ANOVA interpretation of a main effect). >>>>>>> >>>>>>> If my responses to your comments aren’t on target, I would be very glad >>>>>>> for clarification! >>>>>>> >>>>>>> Best >>>>>>> >>>>>>> Roger >>>>>>> >>>>>>> >>>>>>>> On Sep 19, 2016, at 7:31 PM, Daniel Ezra Johnson >>>>>>>> <danielezrajohn...@gmail.com> wrote: >>>>>>>> >>>>>>>> If you follow this procedure, though, you'd be testing for the effect >>>>>>>> of Listener when SyntaxType is "in the middle" (unweighted) of the >>>>>>>> three levels. This quantity not only has no sensible interpretation, >>>>>>>> it also depends on the balance of the levels of SyntaxType in the data. >>>>>>>> >>>>>>>> If the effect of Listener is in the same direction for each level of >>>>>>>> SyntaxType, such a test might be useful, but otherwise I don't think >>>>>>>> it would be? >>>>>>>> >>>>>>>> Dan >>>>>>> >>>>>>>> On Sep 19, 2016, at 8:00 PM, Daniel Ezra Johnson >>>>>>>> <danielezrajohn...@gmail.com> wrote: >>>>>>>> >>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> it also depends on the balance of the levels of SyntaxType in the >>>>>>>> data. >>>>>>>> >>>>>>> >>>>>>>> Well not quite. The idea of testing the "middle level" is right, and >>>>>>>> whatever this means doesn't change if the balance of data changes >>>>>>>> across levels... >>>>>>>> >>>>>>> >>>>>>>> But you could have two data sets where the effects of listener for >>>>>>>> each level of SyntaxType are the same (between the two data sets), but >>>>>>>> the significance of this test changes… >>>>>>> >>>>>>> >>>>>>>> >>>>>>>>> On Mon, Sep 19, 2016 at 2:45 PM, Levy, Roger <rl...@ucsd.edu> wrote: >>>>>>>>> Hi Rachel, >>>>>>>>> >>>>>>>>> If your goal is to test the main effect of Listener in the presence >>>>>>>>> of the Listener-SyntaxType interaction, as would typically be done in >>>>>>>>> traditional ANOVA analyses, I recommend you read this brief paper I >>>>>>>>> wrote a few years ago on how to do this: >>>>>>>>> >>>>>>>>> http://arxiv.org/abs/1405.2094 >>>>>>>>> >>>>>>>>> It is exactly targeted at this problem, and explains why you’re >>>>>>>>> getting the behavior you report due to differences in how R treats >>>>>>>>> factors versus numeric variables in formulae. (Setting the contrasts >>>>>>>>> on the factor has no impact.) >>>>>>>>> >>>>>>>>> I have no explanation for your reported behavior of why you don’t get >>>>>>>>> this problem when you test for the main effect of SyntaxType; if you >>>>>>>>> give further details, we might be able to help further! >>>>>>>>> >>>>>>>>> Best >>>>>>>>> >>>>>>>>> Roger >>>>>>>>> >>>>>>>>>> On Sep 18, 2016, at 5:57 PM, Rachel Ostrand >>>>>>>>>> <rostr...@cogsci.ucsd.edu> wrote: >>>>>>>>>> >>>>>>>>>> Hi everyone, >>>>>>>>>> >>>>>>>>>> I'm having trouble with some 2-factor glmer models that I'm trying >>>>>>>>>> to run, such that the model with one of the main effects removed is >>>>>>>>>> coming out identical to the full model. Some colleagues suggested >>>>>>>>>> that this might be due to the coding of my factors, specifically >>>>>>>>>> because I have a factor that has 3 levels, and that one needs to be >>>>>>>>>> treated differently, but I'm not sure how - or why - to do that. >>>>>>>>>> >>>>>>>>>> Brief summary of my data: >>>>>>>>>> -My DV (called Target_E2_pref) is a binary categorical variable. >>>>>>>>>> -There are two categorical IVs: Listener (2 levels) and SyntaxType >>>>>>>>>> (3 levels). >>>>>>>>>> -Listener varies by both subject and item (i.e., picture); >>>>>>>>>> SyntaxType only varies by subject. >>>>>>>>>> >>>>>>>>>> When I dummy coded my variables using contr.treatment(), the model >>>>>>>>>> with the main effect of Listener removed from the fixed effects >>>>>>>>>> comes out identical to the full model: >>>>>>>>>> >>>>>>>>>> SoleTrain = read.table(paste(path, "SoleTrain.dat", sep=""), >>>>>>>>>> header=T) >>>>>>>>>> SoleTrain$Listener.f = factor(SoleTrain$Listener, labels=c("E1", >>>>>>>>>> "E2")) >>>>>>>>>> contrasts(SoleTrain$Listener.f) = contr.treatment(2) >>>>>>>>>> SoleTrain$SyntaxType.f = factor(SoleTrain$SyntaxType, >>>>>>>>>> labels=c("Transitive", "Locative", "Dative")) >>>>>>>>>> contrasts(SoleTrain$SyntaxType.f) = contr.treatment(3) >>>>>>>>>> >>>>>>>>>> # Create full model: >>>>>>>>>> SoleTrain.full<- glmer(Target_E2_pref ~ Listener.f*SyntaxType.f + (1 >>>>>>>>>> + Listener.f*SyntaxType.f|Subject) + (1 + Listener.f|Picture), data >>>>>>>>>> = SoleTrain, family = binomial, verbose=T, >>>>>>>>>> control=glmerControl(optCtrl=list(maxfun=20000))) >>>>>>>>>> >>>>>>>>>> # Create model with main effect of Listener removed: >>>>>>>>>> SoleTrain.noListener<- glmer(Target_E2_pref ~ SyntaxType.f + >>>>>>>>>> Listener.f:SyntaxType.f + (1 + Listener.f*SyntaxType.f|Subject) + (1 >>>>>>>>>> + Listener.f|Picture), data = SoleTrain, family = binomial, >>>>>>>>>> verbose=T, control=glmerControl(optCtrl=list(maxfun=20000))) >>>>>>>>>> >>>>>>>>>> > anova(SoleTrain.full, SoleTrain.noListener) >>>>>>>>>> Data: SoleTrain >>>>>>>>>> Models: >>>>>>>>>> SoleTrain.full: Target_E2_pref ~ Listener.f * SyntaxType.f + (1 + >>>>>>>>>> Listener.f * SyntaxType.f | Subject) + (1 + Listener.f | Picture) >>>>>>>>>> SoleTrain.noListener: Target_E2_pref ~ SyntaxType.f + >>>>>>>>>> Listener.f:SyntaxType.f + (1 + Listener.f * SyntaxType.f | Subject) >>>>>>>>>> + (1 + Listener.f | Picture) >>>>>>>>>> Df AIC BIC logLik deviance Chisq Chi Df >>>>>>>>>> Pr(>Chisq) >>>>>>>>>> SoleTrain.full 30 2732.5 2908.5 -1336.2 2672.5 >>>>>>>>>> >>>>>>>>>> SoleTrain.noListener 30 2732.5 2908.5 -1336.2 2672.5 0 0 >>>>>>>>>> 1 >>>>>>>>>> >>>>>>>>>> However, I don't have this problem when I test for the main effect >>>>>>>>>> of SyntaxType, and remove the SyntaxType.f factor from the fixed >>>>>>>>>> effects. (That is, this produces a different model than the full >>>>>>>>>> model.) >>>>>>>>>> >>>>>>>>>> Someone suggested that Helmert coding was better for factors with >>>>>>>>>> more than two levels, so I tried running the same models except with >>>>>>>>>> Helmert coding [contrasts(SoleTrain$SyntaxType.f) = >>>>>>>>>> contr.helmert(3)], but the models come out identical to the way they >>>>>>>>>> do with dummy coding. So why does the model with the main effect of >>>>>>>>>> Listener removed the same as the model with the main effect of >>>>>>>>>> Listener retained? >>>>>>>>>> >>>>>>>>>> Any suggestions as to what I'm doing wrong? >>>>>>>>>> >>>>>>>>>> Thanks! >>>>>>>>>> Rachel >>>>>>>>> >>>>>>>>