[ECOLOG-L] AIC, data-dredging, and inappropriate stats

Bruce Robertson Mon, 08 Feb 2010 12:04:53 -0800

Dear Ecologists,

I've been using an information-theoretic model-selection approach as apart of my research and have found that the ecological literatureappears to be very hypocritical and inconsistent in how these stats areused and interpreted. I've been consulting a statistician and he hasverified and clarified some interesting problems, about which I'd loveto hear your comments.

1. Data dredging. Starting with 6 independent variables and a singledependent variable. Historically, the recommended approach is to choosea set of a-priori models containing combinations of these variables thatmake ecological sense, then rank them using AIC scores and weights.Running all possible combinations of these 6 variables has historicallybeen looked down upon because type II error goes through the roof. Forexample, in hypothesis testing with a crit p-val of .05, 1 out of every20 models you will run will appear statistically significant just bychance alone. There are not yet any methods to account for the type IIerror associated with running a bunch of spurious models in the AICranking approach. Why do I see soooo many paper in soooo many highlyranked ecological journals (e.g. ecology, ecology letters, ecologicalapplications) that do this (run all possible comibations of variables)anyway?

2. Summing of AIC scores. People who run all combinations of variablesin their model selection approach will often sum up all the AIC scoresof all models with variable 1, then "," with variable 2, etc. The totalof these scores for each variable is supposed to reflect it'simportance. The approach seems problematic because it is based upon datadredging (above), but seems common in journals like Ecology, Ecologyletters, etc. I actually saw one paper in the Journal of Biogeography inwhich the author choice to select a set of a-priori models to run, thentook this summing approach. Wouldn't this just show that the mostimportant variables were the variables that the analyst thought wereimportant a-priori.

3. Use of other 'fit' statistics along with the model-selectionapproach. I often see people reporting other statistics (e.g. p-vals,r-squared) in combination with the AIC scores. My statistician friendsays that this is totally inappropriate, and uninformative.

My general impression is that, while the statistical world has yet todevelop more robust techniques (e.g accounting for type II error inmodel selection), that there are clear recommendations that make someapproaches (e.g. data dredging) clearly improper. Please comment onwhether ecologists are simply not 'following the rules' (perhaps out ofignorance) or whether there really are different and statistically validopinions on this topic.


Many thanks to all,

--
Bruce Robertson
Research Associate
Kellogg Biological Station
Michigan State University
3700 East Gull Lake Drive
Hickory Corners, MI 49060
206-71-9172
[email protected]
Homepage: www.msu.edu/~roberba1/Index.html/

[ECOLOG-L] AIC, data-dredging, and inappropriate stats

Reply via email to