Re: Data set with many many zeros..... Help?

Highland Statistics Ltd. Sun, 13 Jan 2008 08:03:46 -0800

On Sat, 12 Jan 2008 15:38:33 -0400, Stephen Cole <[EMAIL PROTECTED]> wrote:


>Hello Ecolog - I was wondering if anyone had any advice on the following
>problem.
>
>I have a data set that is infested by a plague of zeros that is causing me
>to violate all assumptions of classic parametric testing.  These are true
>zeros in that the organisms in question did not occur in my randomly 
sampled
>quadrats.  They are not "missing data"
>
>I have a fully nested Hierarchical design
>My response variable is density obtained from quadrat counts.
>my explanatory variables are as follows
>
>Region                       (3 levels-fixed)
>Location(Region)         (4 levels - random
>Site(Location(Region))  (4 levels - random)
>
>My plan was to analyze the data with a nested anova and then proceed to
>calculate variance components to allow me to parse out the variance that
>could be attributed to each spatial scale in my design.  Since it is known
>that violations of assumptions severely distort variance components in
>random factors, i would really like to clean up my data set to meet the
>assumptions but as of yet i have found no acceptable remedial measure.
>

Stephen,
The good news for you is that this is a common problem; it is called zero 
inflation. The solution is zero inflated Poisson, zero inflated negative 
binomial, zero altered poisson, or zero altered negative binomial GLMs. 
These are mixture models. Just Google ZIP, ZINB, ZAP, ZANB (or hurdle 
models). There is a nice online pdf from Zeileis, Kleiber and Jackman, 
showing you how to do these analyses in R. The book from Cameron and 
Trivedi gives the maths. Our next book has a 40 page chapter on this stuff 
(in R), but that won't help you now.

The difference between ZI and ZA is the nature of the zeros (false zeros 
or true zeros), and the difference between Poisson and NB is wether you 
have extra overdispersion due to the counts, or only due to the zeros.

Software in R for this stuff is reasonably new. Packages pscl and VGAM are 
good starting points.

The bad news is that I am not sure what you have in terms of software for 
ZIPs + random effects. Both Cameron and Trivedi and Hilbe (2007) discuss 
these methods in the context of random effects. There was a paper in 
Environmetrics (end of 2007) applying ZIP with spatial/temporal 
correlation on seal data...in R. There are more, all very recent, papers 
with ZIP/ZAP + random effects. You may have to write the software code for 
doing this...I don't know.

Having said that...you say that your random effects have 4 levels. I doubt 
if this is enough! Perhaps you should consider them as fixed? See Pinheiro 
and Bates.

ZIP/ZAP is very interesting stuff!

Alain


Dr. Alain F. Zuur
First author of:   

1. Analysing Ecological Data (2007).  
Zuur, AF, Ieno, EN and Smith, GM. Springer. 680 p.
URL: www.springer.com/0-387-45967-7


2. Analysing Ecological data using GLMM and GAMM in R. (2008). 
Zuur, AF, Ieno, EN, Walker, N and Smith, GM. Springer.


3. An introduction to R for life scientists: - With a paper submission 
guide - (2008).
Zuur, AF, Ieno, EN and Meesters, EHGW. Springer


Other books: http://www.brodgar.com/books.htm


Statistical consultancy, courses, data analysis and software
Highland Statistics Ltd.
6 Laverock road
UK - AB41 6FN Newburgh
Tel: 0044 1358 788177
Email: [EMAIL PROTECTED]
URL: www.highstat.com
URL: www.brodgar.com













>Has anyone else run into this problem when analysing abundance data.  I am
>aware of conditional models, but i have no practical experience with them
>and i am not even sure how to proceed with analysis in that case.  I have
>been using the R program to tackle this problem and i have also found no
>advice on the r-help mailing list.
>
>Thanks for any help that can be provided
>
>Stephen Cole
>Marine Ecology Lab
>Saint Francis Xavier University
>=========================================================================

Re: Data set with many many zeros..... Help?

Reply via email to