Re: [R] GLMM - Am I trying the impossible?

Prof Brian Ripley Thu, 18 Aug 2005 03:22:40 -0700

It is not supported to call anova() on a glmmPQL fit.

For the glmmPQL fit you show, you have very large parameter estimates fora logistic and have partial separation (as you comment on for the controlgroup): in that case PQL is not a reasonable method.


Try

fit <- glm(dead ~ Parasite * Bacteria, data=fish.data, family=binomial)
summary(fit)
anova(fit, test="Chisq")
fitted(fit)

and you have fitted values of zero (up to numerical tolerances).

This *is* discussed in MASS, around pp.198-9.

So there is little point in adding random efects for that model.  Now try

fit2 <- glmmPQL(dead ~ Parasite + Bacteria, random=~1|Tank,
                family=binomial, data=fish.data)
summary(fit2)

Fixed effects: dead ~ Bacteria + Parasite
                Value Std.Error  DF   t-value p-value
(Intercept) -4.243838 0.7519194 150 -5.644007  0.0000
Parasiteyes  1.264102 0.4205313   7  3.005964  0.0198
Bacteriayes  2.850640 0.7147180   7  3.988483  0.0053

which is pretty similar to the lmer fit you show.

I don't know what anova is doing for your lmer fit, but I do know that itshould not be working with sums of squares as is being reported.


On Thu, 18 Aug 2005, Pedro J. Aphalo wrote:

Dear all,

I have tried to calculate a GLMM fit with lmer (lme4) and glmmPQL
(MASS), I also used glm for comparison.


I think you missed what glm was trying to tell you.

I am getting very different results from different functions, and I
suspect that the problem is with our dataset rather than the functions,
but I would appreciate help in deciding whether my suspicions are right.
If indeed we are attempting the wrong type of analysis, some guidance
about what would be the right thing to do would be greatly appreciated.

The details:
The data:
The data are from the end-point of a survival experiment with fish. The
design of the experiment is a 2 x 2 factorial, with each factor
(Bacteria and Parasite) at two levels (yes and no). There were 16 fish
in each tank, and the treatment was applied to the whole tank. There
were in all 10 tanks (160 fish), with 2 tanks for controls (no/no), 2
tanks for (Parasite:yes/Bacteria:no) and 3 tanks for each of the other 2
treatments. A dead fish was considered a success, and a binomial family
with the default logit link was used in the fits. No fish died in the
control treatment (Is this the problem?).

Overall "probabilities" as dead/total for the four treatments were:
Paras Bact  Prob
no    no    0
yes   no    0.0625
no    yes   0.208
yes   yes   0.458

We are interested in testing main effects and interaction, but the
interaction is of special interest to us.

The data for "dead" are coded as 0/1 with 1 indicating a dead fish, and
the file has one row per fish.

Some results:

lme4  (ver 0.98-1, R 2.1.1, Windows XP)
~~~~

> fish1.lmerPQL <- lmer(dead ~ Parasite * Bacteria + (1|Tank),
data=fish.data, family=binomial)
Error in lmer(dead ~ Parasite * Bacteria + (1 | Tank), data = fish.data,  :
        Unable to invert singular factor of downdated X'X
In addition: Warning message:
Leading minor of size 4 of downdated X'X is indefinite
>

without the interaction:
> fish3.lmerPQL <- lmer(dead ~ Parasite + Bacteria + (1|Tank),
data=fish.data, family=binomial)
> anova(fish3.lmerPQL)
Analysis of Variance Table
         Df  Sum Sq Mean Sq   Denom F value Pr(>F)
Parasite  1   0.012   0.012 157.000  0.0103 0.9192
Bacteria  1   0.058   0.058 157.000  0.0524 0.8192
> summary(fish3.lmerPQL)
Generalized linear mixed model fit using PQL
Formula: dead ~ Parasite + Bacteria + (1 | Tank)
   Data: fish.data
 Family: binomial(logit link)
      AIC      BIC    logLik deviance
 141.3818 156.7577 -65.69091 131.3818
Random effects:
     Groups        Name    Variance    Std.Dev.
       Tank (Intercept)       5e-10  2.2361e-05
# of obs: 160, groups: Tank, 10

Estimated scale (compare to 1)  0.9318747

Fixed effects:
            Estimate Std. Error z value  Pr(>|z|)
(Intercept) -4.24380    0.79924 -5.3098 1.098e-07 ***
Parasiteyes  1.26407    0.44694  2.8283 0.0046801 **
Bacteriayes  2.85062    0.75970  3.7523 0.0001752 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Correlation of Fixed Effects:
            (Intr) Prstys
Parasiteyes -0.429
Bacteriayes -0.898  0.093

Very different P-values from anova and summary.

MASS: (ver 7.2-17, R 2.1.1, Windows XP)
~~~~~

> summary(fish1.glmmpql)
Linear mixed-effects model fit by maximum likelihood
 Data: fish.data
       AIC      BIC   logLik
  1236.652 1255.103 -612.326

Random effects:
 Formula: ~1 | Tank
        (Intercept)  Residual
StdDev:  0.02001341 0.8944214

Variance function:
 Structure: fixed weights
 Formula: ~invwt
Fixed effects: dead ~ Parasite * Bacteria
                            Value Std.Error  DF     t-value p-value
(Intercept)             -18.56607  1044.451 150 -0.01777591  0.9858
Parasiteyes              15.85802  1044.451   6  0.01518311  0.9884
Bacteriayes              17.23107  1044.451   6  0.01649772  0.9874
Parasiteyes:Bacteriayes -14.69007  1044.452   6 -0.01406487  0.9892
 Correlation:
                        (Intr) Prstys Bctrys
Parasiteyes             -1
Bacteriayes             -1      1
Parasiteyes:Bacteriayes  1     -1     -1

Standardized Within-Group Residuals:
          Min            Q1           Med            Q3           Max
-1.028634e+00 -5.734674e-01 -2.886770e-01 -4.224474e-14  4.330155e+00

Number of Observations: 160
Number of Groups: 10
> anova(fish1.glmmpql)
                  numDF denDF   F-value p-value
(Intercept)           1   150 17.452150  <.0001
Parasite              1     6  4.136142  0.0882
Bacteria              1     6 12.740212  0.0118
Parasite:Bacteria     1     6  0.000198  0.9892
>

> anova(glmmPQL(dead~Bacteria*Parasite, random=~1|Tank,
family=binomial, data=fish.data))
iteration 1
                  numDF denDF   F-value p-value
(Intercept)           1   150 17.452150  <.0001
Bacteria              1     6  8.980833  0.0241
Parasite              1     6  7.895521  0.0308
Bacteria:Parasite     1     6  0.000198  0.9892
>

Now anova indicates significance, but summary gives huge P-values.

I have looked in MASS, ISwR, Fox's and Crawley's book, for hints, but I
have probably missed the right spots/books. Hints about what to read
will be also appreciated.

Many thanks in advance, and sorry about the long message.

The report on this research is obviously not yet published, but if the
dataframe would be of help, it can be found as saved from R at:
http://people.cc.jyu.fi/~aphalo/R_fish/fish.Rda. (Use load to read it
into R).

Pedro.
--
==================================================================
Pedro J. Aphalo
Department of Biological and Environmental Science
University of Jyväskylä
P.O. Box 35, 40351 JYVÄSKYLÄ, Finland
Phone  +358 14 260 2339
Mobile +358 50 3721504
Fax    +358 14 260 2321
mailto:[EMAIL PROTECTED]
http://www.jyu.fi/~aphalo/                       ,,,^..^,,,

______________________________________________
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


--
Brian D. Ripley,                  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

______________________________________________
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] GLMM - Am I trying the impossible?

Reply via email to