Dear Ecologists,

I've been using an information-theoretic model-selection approach as a part of my research and have found that the ecological literature appears to be very hypocritical and inconsistent in how these stats are used and interpreted. I've been consulting a statistician and he has verified and clarified some interesting problems, about which I'd love to hear your comments.

1. Data dredging. Starting with 6 independent variables and a single dependent variable. Historically, the recommended approach is to choose a set of a-priori models containing combinations of these variables that make ecological sense, then rank them using AIC scores and weights. Running all possible combinations of these 6 variables has historically been looked down upon because type II error goes through the roof. For example, in hypothesis testing with a crit p-val of .05, 1 out of every 20 models you will run will appear statistically significant just by chance alone. There are not yet any methods to account for the type II error associated with running a bunch of spurious models in the AIC ranking approach. Why do I see soooo many paper in soooo many highly ranked ecological journals (e.g. ecology, ecology letters, ecological applications) that do this (run all possible comibations of variables) anyway?

2. Summing of AIC scores. People who run all combinations of variables in their model selection approach will often sum up all the AIC scores of all models with variable 1, then "," with variable 2, etc. The total of these scores for each variable is supposed to reflect it's importance. The approach seems problematic because it is based upon data dredging (above), but seems common in journals like Ecology, Ecology letters, etc. I actually saw one paper in the Journal of Biogeography in which the author choice to select a set of a-priori models to run, then took this summing approach. Wouldn't this just show that the most important variables were the variables that the analyst thought were important a-priori.

3. Use of other 'fit' statistics along with the model-selection approach. I often see people reporting other statistics (e.g. p-vals, r-squared) in combination with the AIC scores. My statistician friend says that this is totally inappropriate, and uninformative.

My general impression is that, while the statistical world has yet to develop more robust techniques (e.g accounting for type II error in model selection), that there are clear recommendations that make some approaches (e.g. data dredging) clearly improper. Please comment on whether ecologists are simply not 'following the rules' (perhaps out of ignorance) or whether there really are different and statistically valid opinions on this topic.

Many thanks to all,

--
Bruce Robertson
Research Associate
Kellogg Biological Station
Michigan State University
3700 East Gull Lake Drive
Hickory Corners, MI 49060
206-71-9172
[email protected]
Homepage: www.msu.edu/~roberba1/Index.html/

Reply via email to