The zero problem is common in other fields such as in the analysis of water quality data where concentrations of chemicals are often near or below detection limits. Dennis Helsel's book Nondetects and data analysis provides some thoughtful discussions, including how a failure to properly account for them caused the Challenger space shuttle explosion, and some nonparametric models and censored data models. As to planning for zeros in the experimental design, in my limited experience sometimes it will be apparent that the data will have many zeros and other times it won't. Given that many ecological experiments are at field sites where the responses aren't known ahead of time this is often the case. I tend to plan both ways now but I'd have to say that' despite having more than an average amount of statistical training' that I find the necessary expertise for the statistical designs to be a bit beyond me and good statistical advice in this particular area to be rare.
To me it is also not always clear how to interpret the meaning of the zeros since I often don't know their cause and if someone could tell me how to do the equivalent of a three-way randomized complete blocks design with blocks as random effects and include the treatment interactions I'd be really happy and I could publish some data that have been sitting for 10 years waiting for the appropriate statistical techniques to be developed John Gerlach ----- Original Message ---- From: Michele Scardi <[EMAIL PROTECTED]> To: [email protected] Sent: Monday, January 14, 2008 4:58:18 AM Subject: Re: [ECOLOG-L] Data set with many many zeros..... Help? Monday, January 14, 2008, 4:11:24 AM, Warren Aney wrote: WWA> Bill, are we the Luddites in this arena? I agree with you, and my WWA> statistics professor would have taken it one important step further: Choose WWA> your statistical analysis methods before you start collecting your data -- WWA> ... "To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of." Sir Ronald Aylmer Fisher As for the "Data set with nmany zeros" thread, I'm sure that ZIP, ZINB, ZAP, ZANB, etc. are absolutely appropriate (and effective) in some cases, but the real problem is that in some case zeros cannot be regarded as plain numbers. They are just a completely different beast. Imagine you are relating the abundance of a given species to an environmental variable (e.g. temperature). Usually you'll find an optimum value (=max abundance) and a range of values that are associated to decreasing abundances. But what about zeros? Zeros mean: species present, but not found; species absent because of other reasons (i.e. temperature is ok, something else is not ok: e.g. competition); species absent because temperature is too low; species absent because temperature is too high; species absent because temperature is way too low; species absent because temperature is way too high; etc. Basically, all these responses produce a zero abundance (i.e. they are quantitatively identical), but they are completely different from the qualitative point of view. So, the bottom line is that data transformations as well as ZIP, ZINB, etc. models can solve some problems, but in some other cases what is really relevant is the meaning of the zero vs. non-zero values, and in those cases a qualitative approach (contingency tables, chi-square stats, etc., but keep it as ismple as possible!) can be a better option. Best, Michele -------------------------------- Michele Scardi Associate Professor of Ecology Department of Biology University of Rome "Tor Vergata" Via della Ricerca Scientifica 00133 Roma Italy http://www.mare-net.com/mscardi --------------------------------
