Dear ecology, AIC = model deviance + 2*(# of parameters).
In essence, AIC is calculated so that a model that "best" balances between decreasing the deviance of the model from the data (we want this) and keeping a model simple and/or relevant. The deviance will be small if the covariates (explanatory variables) are "good" or if we have a ton of lousy covariates. Thus AIC penalizes excessive covariates by adding 2*# parameters (i.e. your Betas which are estimated for each covariate and covariate interaction). Whether or not to use AIC, Rsq, or both comes down to the model design, and the results you are after. Do you want to explain the current state of a system and show which covariates are important? Minimize AIC. Do you want to predict future changes in the system? Maximize R2. This is my view from a Statistics perspective since I haven't studied model selection in a biological setting. Thank you very much, Sarah Fann "Education is what survives when what has been learnt has been forgotten." - Fortune cookie ________________________________________ From: Ecological Society of America: grants, jobs, news [[email protected]] On Behalf Of Scott Crosbie [[email protected]] Sent: Wednesday, February 10, 2010 11:47 AM To: [email protected] Subject: Re: [ECOLOG-L] AIC, data-dredging, and inappropriate stats Hi eco-loggers, There seems to be some confusion on what AIC and Goodness-of-fit (GOF) statistics are for. I am NOT a statistician, but my understanding of the difference between AIC and GOF statistics is as follows: AIC should be used to judge the relationship between precision and bias of a given model, and used in model selection only if relevant GOF statistics do not fail to reject the null hypothesis (that what is ‘expected’ by the model does not significantly depart from what is ‘observed’ in the data). Sometimes AIC is referred to as ‘fit’ but model ‘fitness’ should first be quantified by GOF statistics, some of which have different assumptions of normality and/or focus on certain areas of the model and data to quantify fitness (see differences between chi-squared, Kolmogorov-Smirnov, Cramér–von-Mises, etc.). Which GOF statistics to focus on will depend partly on what assumptions you make in the process of data gathering and modeling. Scott Crosbie --- On Wed, 2/10/10, Michael Cooperman <[email protected]> wrote: From: Michael Cooperman <[email protected]> Subject: Re: [ECOLOG-L] AIC, data-dredging, and inappropriate stats - correction To: [email protected] Date: Wednesday, February 10, 2010, 6:14 AM Hello fellow Ecologers- Yesterday I offered a response to the original posting on issues surrounding use of AIC. Sadly, I failed to proof what I wrote and thereby submitted some wrong stuff -- I wrote, "if 2 different measures of fit (i.e., delta AIC value and r^2) support different conclusions...." -- Of course, delta AIC is not a measure of goodness of fit, it is a measure of the "quality" (i.e., information loss) of a given model in comparison to other tested models. Hence, in my response to point 3 of the original post, which read: 3. Use of other 'fit' statistics along with the model-selection approach. I often see people reporting other statistics (e.g. p-vals, r-squared) in combination with the AIC scores. My statistician friend says that this is totally inappropriate, and uninformative. My response should have been....since delta AIC and r2 measure different things, I think it can be appropriate to report them together; not as equal measures for model selection but as r2 informing on the relative value of the AIC solution (i.e., if AIC indicates model X is the best, but it has an exceptionally low r2 (assuming r2 is suitable to use as the relationship is linear) then even the best model identified by AIC is still pretty weak. Sorry for any confusion. Michael
