Hello ecolog,
I disagree with the suggestion that maximizing R2 is a good way to
predict future changes to a system... maximizing R2 may produce a
perfect fit to your current data set, but you are fitting to the noise
as well as the signal, and such a model will likely perform poorly with
new data. I think that if you want to have predictive power, you should
probably still use a parsimonious approach like AIC, since this will
tend to reject covariates that only have a small impact on the model's
predictive power.
Brian Mitchell
Date: Wed, 10 Feb 2010 16:36:18 -0500
From: "Fann, Sarah Lynn" <[email protected]>
Subject: Re: AIC, data-dredging, and inappropriate stats
Dear ecology,
AIC = model deviance + 2*(# of parameters).
In essence, AIC is calculated so that a model that "best" balances between decreasing the deviance of the model from the data (we want this) and keeping a model simple and/or relevant. The deviance will be small if the covariates (explanatory variables) are "good" or if we have a ton of lousy covariates. Thus AIC penalizes excessive covariates by adding 2*# parameters (i.e. your Betas which are estimated for each covariate and covariate interaction).
Whether or not to use AIC, Rsq, or both comes down to the model design, and the
results you are after. Do you want to explain the current state of a system and
show which covariates are important? Minimize AIC. Do you want to predict
future changes in the system? Maximize R2.
This is my view from a Statistics perspective since I haven't studied model selection in a biological setting.
Thank you very much,
Sarah Fann