On Tue, 26 Jun 2012, Marc Schwartz wrote:
On Jun 26, 2012, at 2:10 PM, SSimek wrote:
Hello,
I have count data that illustrate the presence or absence of individuals in
my study population. I created a grid cell across the study area and
calcuated a count value for each individual per season per year for each
grid cell. The count value is the number of time an individual was present
in each grid cell. For illustration my data columns look something like
this and are repeated for each individual:
Cell_ID Param1 Param2 Param3 Param4 COUNT Name Year Season Cov
1 160.565994 729.08 1503 7930.3 0 AA 2010 AUT
Open
1 160.565994 729.08 1503 7930.3 22 AA 2011 SPR
Open
1 160.565994 729.08 1503 7930.3 12 AA 2009 SUM
Open
1 160.565994 729.08 1503 7930.3 0 AA 2010 SUM
Open
2 169.427001 491.87 1503.31 5101.09 0 AA 2010 AUT
oldHard
2 169.427001 491.87 1503.31 5101.09 16 AA 2011 SPR
oldHard
2 169.427001 491.87 1503.31 5101.09 0 AA 2009 SUM
oldHard
2 169.427001 491.87 1503.31 5101.09 0 AA 2010 SUM
oldHard
?
563 86.777099 612.69 977 4474.6 62 AA 2010 AUT
Water
563 86.777099 612.69 977 4474.6 12 AA 2011 SPR
Water
563 86.777099 612.69 977 4474.6 55 AA 2009 SUM
Water
1 160.565994 729.08 1503 7930.3 0 BB 2010 SUM
Open
2 169.427001 491.87 1503.31 5101.09 72 BB 2010 SUM
oldHard
5 160.75 614.95 1503.31 2878.98 16 BB 2010 SUM medHard
6 170.404998 510.58 1489.44 743.14 0 BB 2010 SUM
Water
?
563 86.777099 612.69 977 4474.6 0 BB 2010 SUM
Water
1 160.565994 729.08 1503 7930.3 14 C 2005 AUT
Open
1 160.565994 729.08 1503 7930.3 0 C 2006 AUT
Open
1 160.565994 729.08 1503 7930.3 0 C 2006 SPR
Open
1 160.565994 729.08 1503 7930.3 56 C 2007 SPR
Open
1 160.565994 729.08 1503 7930.3 0 C 2006 SUM
Open
2 169.427001 491.87 1503.31 5101.09 124 C 2005 AUT
oldHard
2 169.427001 491.87 1503.31 5101.09 231 C 2006 AUT
oldHard
2 169.427001 491.87 1503.31 5101.09 889 C 2006 SPR
oldHard
2 169.427001 491.87 1503.31 5101.09 0 C 2007 SPR
oldHard
?
563 86.777099 612.69 977 4474.6 0 C 2005
AUT Water
563 86.777099 612.69 977 4474.6 231 C 2006
AUT Water
563 86.777099 612.69 977 4474.6 185 C 2006
SPR Water
563 86.777099 612.69 977 4474.6 123 C 2007
SPR Water
563 86.777099 612.69 977 4474.6 52 C 2006
SUM Water
I have 563 grid cells across my study area and each individual has 1-563
cells associated for each year and each season the individual was monitored.
Therefore my grid cells are repeated. I end up with 71,000 records and 925
records have a Count value >0; which means 70,075 records have a Count value
= 0.
I wanted to run a zero inflated poisson model to determine mixed effects (of
parameters) with individual as the random effect. But I have been advised
two things:
1. I cannot run a zero inflated poisson model because my data are too
"extremely" inflated (i.e. 70,075 vs 925) and
2. I cannot run the model with each cell repeated for each individual. I am
told the model doesn't recognize that Cell_ID #1 for individual "A" is the
same Cell_ID #1 for individual "B".
Does anyone know if either or both of these points are true? I would
appreciate any thoughts, advice, or suggestions.
Thanks!
-Stephanie
Hi Stephanie,
Some comments:
1. You should think about or at least be open to a zero inflated negative
binomial distribution rather than zero inflated poisson.
2. You should at least review the vignette for the pscl CRAN package, which
provides standard fixed effects models and related functions for count based
data and importantly, some good conceptual content:
http://cran.r-project.org/web/packages/pscl/vignettes/countreg.pdf
3. Given the repeated measures framework and correlation issues you likely
have, you should subscribe to and re-post your query to the R-sig-mixed-models
list:
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
which will avail you of experts in the field.
4. There is also a draft FAQ for mixed models here:
http://glmm.wikidot.com/faq
which I believe is maintained by Ben Bolker, who actively participates in the
above list. Based upon the content there, I suspect that you will be pointed to
the glmmADMB package which is on R-Forge
(http://glmmadmb.r-forge.r-project.org/) and can handle zero inflated mixed
effects models of at least some types.
5. If all else fails, just to plant a seed, you might want to consider a
mixed effects logistic regression model with a binary response, since
you appear to have a relatively small "event" incidence in your data.
The above list will also be helpful in that setting and you would likely
be pointed to the glmer() function in the lme4 package for that
application, which provides for GLMs in a mixed effects framework.
Thanks, Marc, all very useful points! Just one addition:
I would recommend starting with the last point - a binary response
regression (for y > 0). This could be considered as the zero-hurdle of a
hurdle regression.
Hurdle regressions are an alternative to zero-inflated models, but have
the nice property that you can separately estimate both parts of the
hurdle: (1) a binary regression for y=0 vs. y > 0. (2) A truncated count
model for y, estimated only from the observations y>0. The "pscl" package
contains a hurdle() function which estimates both parts in one go (and the
"countreg" vignette gives more details and references), but in this case
it would probably be useful to estimate them separately.
In any case, both parts will need care because the binary response
probably contains a lot of (quasi-)complete separations because non-zeros
are so rare. Conversely, the truncated count model may be hard to estimate
because there are no observations for a lot of parameter combinations. But
estimating the models separately will give you more flexibility in
addressing these issues.
To estimate the zero-truncated count distributions, you may consider the
"countreg" package from R-Forge which uses the same code as (one part of)
the hurdle() function.
hth,
Z
Regards,
Marc Schwartz
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.