[R-lang] Re: Contrast coding for a factor with more than 2 levels

Henrik Singmann Mon, 26 Sep 2016 07:27:05 -0700

Dear all,

I am somewhat late to the party (was traveling until yesterday), butwould also like to add my two cents to this discussion. Especially asafex was mentioned here.

afex::mixed() indeed essentially does what is suggested here: It createsnumerical variables from your factors and fits them in a sequentialmanner such that you get tests of all effects. To get the likelihoodratio tests you want you need to set method = "LRT" in the call to mixed(the default method, "KR", uses Kenward-Roger for p-values which is notappropriate for GLMMs, "PB" can also be used for GLMMs, which isparametric-bootstrap).

In contrast to Dale's suggestion mixed per default uses R built infactor types, specifically contr.sum() (i.e., sum-to-zero) contrastswhich only use 1 and -1 as values. To create the variables it then usesmodel.matrix() and works from there.

I am not quite sure I agree with Daniel's point. The described proceduretests the NULL hypothesis of an effect at the (unweighted) mean of theother effects. I do not see how this should be nonsensical. In afactorial design this amounts to assuming that the other factors have noeffect (i.e., there are no deviations from the unweighted grand meanproduced by the levels of other factors).

Finally, as many have noted, the problem is that one cannot remove lowerorder effects in R via the formula interface. A quick example (using lm)can be found here: http://stackoverflow.com/q/11335923/289572This behavior unfortunately generalizes to all interaction orders andand also model types (lm, lmer, glmer, ...). This behavior is also thereason why I developed afex:mixed(), to solve exactly this problems. Ifthere are other modeling functions which are as useful as lmer or glmer,please let me know so I can make sure to also add the automatic testingfor those functions.

In contrast to Dale I believe that the probability of committing anerror when making the numeric variables by hand is higher then lettingan algorithm do that.


Cheers,
Henrik



Am 19.09.2016 um 13:56 schrieb Dale Barr:


Hi Rachel,

I don't recommend using the built in 'factor' type variables when using
R linear model functions, because weird things happen when you try to
drop things out of a model that includes interactions.  It is better to
create your own numeric predictors, e.g.:

## create centered predictor for listener
## NB: replace 'somelevel' with name of one of levels of your var
SoleTrain$L <- (SoleTrain$Listener == "somelevel") -
  mean(SoleTrain$Listener == "somelevel")

## create 2 centered predictors for SyntaxType
## replace 'level1' and 'level2' with names of 2 levels of your var
SoleTrain$ST1 <- (SoleTrain$SyntaxType == "level1") -
  mean(SoleTrain$SyntaxType == "level1")

SoleTrain$ST2 <- (SoleTrain$SyntaxType == "level2") -
  mean(SoleTrain$SyntaxType == "level2")

These should give you values of -.5, +.5 for new predictor L, and
something like -.33 and +.67 for each of your two new ST1 and ST2
variables (depending on how balanced your data is).  Check that you are
getting (approximately) the right values.  The centering ensures that
you are looking at main rather than simple effects.  You can then fit
your model as:

mod <- glmer(Target_E2_pref ~ L * (ST1 + ST2) +
  (1 + L * (ST1 + ST2) | Subject) + (1 + L | Picture),
  data = SoleTrain, family = binomial, verbose = TRUE,
  control = glmerControl(optCtrl = list(maxfun = 20000))

and drop out terms as needed (you might want to look at the update()
function which makes this easier).

Alternative solution: check out the afex package, which (I think) can do
the factor coding and LRTs for you.

-D

Rachel Ostrand writes:

Hi everyone,

I'm having trouble with some 2-factor glmer models that I'm trying to run,
such that the model with one of the main effects removed is coming out
identical to the full model. Some colleagues suggested that this might be
due to the coding of my factors, specifically because I have a factor that
has 3 levels, and that one needs to be treated differently, but I'm not
sure how - or why - to do that.

Brief summary of my data:
-My DV (called Target_E2_pref) is a binary categorical variable.
-There are two categorical IVs: Listener (2 levels) and SyntaxType (3
levels).
-Listener varies by both subject and item (i.e., picture); SyntaxType only
varies by subject.

When I dummy coded my variables using contr.treatment(), the model with the
main effect of Listener removed from the fixed effects comes out identical
to the full model:

SoleTrain = read.table(paste(path, "SoleTrain.dat", sep=""), header=T)
SoleTrain$Listener.f = factor(SoleTrain$Listener, labels=c("E1", "E2"))
contrasts(SoleTrain$Listener.f) = contr.treatment(2)
SoleTrain$SyntaxType.f = factor(SoleTrain$SyntaxType,
labels=c("Transitive", "Locative", "Dative"))
contrasts(SoleTrain$SyntaxType.f) = contr.treatment(3)

# Create full model:
SoleTrain.full<- glmer(Target_E2_pref ~ Listener.f*SyntaxType.f + (1 +
Listener.f*SyntaxType.f|Subject) + (1 + Listener.f|Picture), data =
SoleTrain, family = binomial, verbose=T,
control=glmerControl(optCtrl=list(maxfun=20000)))

# Create model with main effect of Listener removed:
SoleTrain.noListener<- glmer(Target_E2_pref ~ SyntaxType.f +
Listener.f:SyntaxType.f + (1 + Listener.f*SyntaxType.f|Subject) + (1 +
Listener.f|Picture), data = SoleTrain, family = binomial, verbose=T,
control=glmerControl(optCtrl=list(maxfun=20000)))

anova(SoleTrain.full, SoleTrain.noListener)

Data: SoleTrain
Models:
SoleTrain.full: Target_E2_pref ~ Listener.f * SyntaxType.f + (1 +
Listener.f * SyntaxType.f | Subject) + (1 + Listener.f | Picture)
SoleTrain.noListener: Target_E2_pref ~ SyntaxType.f +
Listener.f:SyntaxType.f + (1 + Listener.f * SyntaxType.f | Subject) + (1 +
Listener.f | Picture)
                     Df    AIC    BIC  logLik deviance Chisq Chi Df
Pr(>Chisq)
SoleTrain.full       30 2732.5 2908.5 -1336.2   2672.5

SoleTrain.noListener 30 2732.5 2908.5 -1336.2   2672.5     0      0
 1

However, I don't have this problem when I test for the main effect of
SyntaxType, and remove the SyntaxType.f factor from the fixed effects.
(That is, this produces a different model than the full model.)

Someone suggested that Helmert coding was better for factors with more than
two levels, so I tried running the same models except with Helmert coding
[contrasts(SoleTrain$SyntaxType.f) = contr.helmert(3)], but the models come
out identical to the way they do with dummy coding. So why does the model
with the main effect of Listener removed the same as the model with the
main effect of Listener retained?

Any suggestions as to what I'm doing wrong?

Thanks!
Rachel

[R-lang] Re: Contrast coding for a factor with more than 2 levels

Reply via email to