Hi,
I am working with gam models in the mgcv library. My response variable (Y) is 
binary (0/1), and my dataset contains repeated measures over 110 individuals 
(same number of 0/1 within a given individual: e.g. 345-zero and 345-one for 
individual A, 226-zero and 226-one for individual B, etc.). The variable Factor 
is separating the individuals in three groups according to mass (group 0,1,2), 
Factor1 is a binary variable coding for individuals of group1, Factor2 is a 
binary variable for individuals of group 2
I use gam models of this sort with random effects coded using a s( ..., 
bs="re") term:
gm<-gam(Y~Factor+te(x1,x2,by=Factor) 
)+s(Individual,bs="re"),dat=Data,family=binomial(link=logit),method="REML")
gm1<-gam(Y~Factor+te(x1,x2)+ te(x1,x2,by=Factor1)+ 
te(x1,x2,by=Factor2)+s(Individual,bs="re"),dat=Data,family=binomial(link=logit),method="REML")

1)First question: is it OK to use gam() to model a binary variable with random 
effects coded as a "bs="re" term"??
I have read that the gamm4() function gives better performance than gamm() to 
deal with binary variables when random effects are coded as: 
random=~(1|Individual) but does that mean that binary variables should not be 
used as response variable in gam() with random effects coded as bs="re"???

2)Second question: For some models, I obtain a p-value=NA and Chi-square=0 for 
the s(Individual) term, and for some other models a p-value=1 and high 
Chi-square. The difference between one model that can estimate a p-value and 
one that cannot is very slight: for example if I use a variable x3 instead of 
x2 in a model, it can change from p-value=NA to p-value=1. Does anyone know 
what can be happening?

3)Third question: Not linked to random effects but rather to what the two 
models gm and gm1 are actually testing. From my understanding, the first model 
creates a 2d-smooth for each level of my factor variable and test whether those 
smooth are significantly different from a straight line. The second model, also 
creates 3 smooth: one for the reference level of my Factor variable (group0), 
one showing the difference between the reference smooth and the smooth for 
group1, one showing the difference between the reference smooth and the smooth 
for group 2. The summary(gm1) gives p-values associated with each of those 
three smooths and which determine:  if the reference smooth is different from 
0, if the smooth for group1 is different from the reference smooth and  if the 
smooth for group2 is different from the reference smooth.
Do I understand well what the models are testing? The number of "edf" estimated 
for te(x1,x2):Factor2 in the gm1 model is 3,013 while it is 19,57 in the gm 
model. Does that mean that the difference between the reference smooth: 
te(x1,x2) and the smooth for group 2: te(x1,x2, by=Factor2) is "small" so it 
can be modeled with only 3 degrees of freedom? Still, the associated p-value is 
highly significant?
When comparing AIC between the gm and gm1 models, I find sometimes that gm1 has 
a lower AIC than gm.  How can that be interpreted??
Thanks a lot if anyone can help...
Geraldine




        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to