[R-lang] Re: Contrast coding for a factor with more than 2 levels

Henrik Singmann Mon, 26 Sep 2016 07:46:12 -0700

PS: I apologize that this might have went out 4 times (!). There seemsto be a problem with my mail client and gmane. I hope you only receivethis mail once.


Am 26.09.2016 um 16:16 schrieb Henrik Singmann:

Dear all,


I am somewhat late to the party (was traveling until yesterday), but
would also like to add my two cents to this discussion. Especially as
afex was mentioned here.

afex::mixed() indeed essentially does what is suggested here: It creates
numerical variables from your factors and fits them in a sequential
manner such that you get tests of all effects. To get the likelihood
ratio tests you want you need to set method = "LRT" in the call to mixed
(the default method, "KR", uses Kenward-Roger for p-values which is not
appropriate for GLMMs, "PB" can also be used for GLMMs, which is
parametric-bootstrap).

In contrast to Dale's suggestion mixed per default uses R built in
factor types, specifically contr.sum() (i.e., sum-to-zero) contrasts
which only use 1 and -1 as values. To create the variables it then uses
model.matrix() and works from there.

I am not quite sure I agree with Daniel's point. The described procedure
tests the NULL hypothesis of an effect at the (unweighted) mean of the
other effects. I do not see how this should be nonsensical. In a
factorial design this amounts to assuming that the other factors have no
effect (i.e., there are no deviations from the unweighted grand mean
produced by the levels of other factors).

Finally, as many have noted, the problem is that one cannot remove lower
order effects in R via the formula interface. A quick example (using lm)
can be found here: http://stackoverflow.com/q/11335923/289572
This behavior unfortunately generalizes to all interaction orders and
and also model types (lm, lmer, glmer, ...). This behavior is also the
reason why I developed afex:mixed(), to solve exactly this problems. If
there are other modeling functions which are as useful as lmer or glmer,
please let me know so I can make sure to also add the automatic testing
for those functions.

In contrast to Dale I believe that the probability of committing an
error when making the numeric variables by hand is higher then letting
an algorithm do that.

Cheers,
Henrik



Am 19.09.2016 um 13:56 schrieb Dale Barr:


Hi Rachel,

I don't recommend using the built in 'factor' type variables when using
R linear model functions, because weird things happen when you try to
drop things out of a model that includes interactions.  It is better to
create your own numeric predictors, e.g.:

## create centered predictor for listener
## NB: replace 'somelevel' with name of one of levels of your var
SoleTrain$L <- (SoleTrain$Listener == "somelevel") -
  mean(SoleTrain$Listener == "somelevel")

## create 2 centered predictors for SyntaxType
## replace 'level1' and 'level2' with names of 2 levels of your var
SoleTrain$ST1 <- (SoleTrain$SyntaxType == "level1") -
  mean(SoleTrain$SyntaxType == "level1")

SoleTrain$ST2 <- (SoleTrain$SyntaxType == "level2") -
  mean(SoleTrain$SyntaxType == "level2")

These should give you values of -.5, +.5 for new predictor L, and
something like -.33 and +.67 for each of your two new ST1 and ST2
variables (depending on how balanced your data is).  Check that you are
getting (approximately) the right values.  The centering ensures that
you are looking at main rather than simple effects.  You can then fit
your model as:

mod <- glmer(Target_E2_pref ~ L * (ST1 + ST2) +
  (1 + L * (ST1 + ST2) | Subject) + (1 + L | Picture),
  data = SoleTrain, family = binomial, verbose = TRUE,
  control = glmerControl(optCtrl = list(maxfun = 20000))

and drop out terms as needed (you might want to look at the update()
function which makes this easier).

Alternative solution: check out the afex package, which (I think) can do
the factor coding and LRTs for you.

-D

Rachel Ostrand writes:

Hi everyone,

I'm having trouble with some 2-factor glmer models that I'm trying to
run,
such that the model with one of the main effects removed is coming out
identical to the full model. Some colleagues suggested that this
might be
due to the coding of my factors, specifically because I have a factor
that
has 3 levels, and that one needs to be treated differently, but I'm not
sure how - or why - to do that.

Brief summary of my data:
-My DV (called Target_E2_pref) is a binary categorical variable.
-There are two categorical IVs: Listener (2 levels) and SyntaxType (3
levels).
-Listener varies by both subject and item (i.e., picture); SyntaxType
only
varies by subject.

When I dummy coded my variables using contr.treatment(), the model
with the
main effect of Listener removed from the fixed effects comes out
identical
to the full model:

SoleTrain = read.table(paste(path, "SoleTrain.dat", sep=""), header=T)
SoleTrain$Listener.f = factor(SoleTrain$Listener, labels=c("E1", "E2"))
contrasts(SoleTrain$Listener.f) = contr.treatment(2)
SoleTrain$SyntaxType.f = factor(SoleTrain$SyntaxType,
labels=c("Transitive", "Locative", "Dative"))
contrasts(SoleTrain$SyntaxType.f) = contr.treatment(3)

# Create full model:
SoleTrain.full<- glmer(Target_E2_pref ~ Listener.f*SyntaxType.f + (1 +
Listener.f*SyntaxType.f|Subject) + (1 + Listener.f|Picture), data =
SoleTrain, family = binomial, verbose=T,
control=glmerControl(optCtrl=list(maxfun=20000)))

# Create model with main effect of Listener removed:
SoleTrain.noListener<- glmer(Target_E2_pref ~ SyntaxType.f +
Listener.f:SyntaxType.f + (1 + Listener.f*SyntaxType.f|Subject) + (1 +
Listener.f|Picture), data = SoleTrain, family = binomial, verbose=T,
control=glmerControl(optCtrl=list(maxfun=20000)))

anova(SoleTrain.full, SoleTrain.noListener)

Data: SoleTrain
Models:
SoleTrain.full: Target_E2_pref ~ Listener.f * SyntaxType.f + (1 +
Listener.f * SyntaxType.f | Subject) + (1 + Listener.f | Picture)
SoleTrain.noListener: Target_E2_pref ~ SyntaxType.f +
Listener.f:SyntaxType.f + (1 + Listener.f * SyntaxType.f | Subject) +
(1 +
Listener.f | Picture)
                     Df    AIC    BIC  logLik deviance Chisq Chi Df
Pr(>Chisq)
SoleTrain.full       30 2732.5 2908.5 -1336.2   2672.5

SoleTrain.noListener 30 2732.5 2908.5 -1336.2   2672.5     0      0
 1

However, I don't have this problem when I test for the main effect of
SyntaxType, and remove the SyntaxType.f factor from the fixed effects.
(That is, this produces a different model than the full model.)

Someone suggested that Helmert coding was better for factors with
more than
two levels, so I tried running the same models except with Helmert
coding
[contrasts(SoleTrain$SyntaxType.f) = contr.helmert(3)], but the
models come
out identical to the way they do with dummy coding. So why does the
model
with the main effect of Listener removed the same as the model with the
main effect of Listener retained?

Any suggestions as to what I'm doing wrong?

Thanks!
Rachel

[R-lang] Re: Contrast coding for a factor with more than 2 levels

Reply via email to