Re: [ECOLOG-L] AIC, data-dredging, and inappropriate stats

Fann, Sarah Lynn Wed, 10 Feb 2010 13:47:08 -0800

Dear ecology,

AIC =  model deviance + 2*(# of parameters).


In essence, AIC is calculated so that a model that "best" balances between 
decreasing the deviance of the model from the data (we want this) and keeping a 
model simple and/or relevant. The deviance will be small if the covariates 
(explanatory variables) are "good" or if we have a ton of lousy covariates. 
Thus AIC penalizes excessive covariates by adding 2*# parameters (i.e. your 
Betas which are estimated for each covariate and covariate interaction). 

Whether or not to use AIC, Rsq, or both comes down to the model design, and the 
results you are after. Do you want to explain the current state of a system and 
show which covariates are important? Minimize AIC. Do you want to predict 
future changes in the system? Maximize R2.

This is my view from a Statistics perspective since I haven't studied model 
selection in a biological setting. 

Thank you very much,

Sarah Fann

"Education is what survives when what has been learnt has been forgotten."

-   Fortune cookie
________________________________________
From: Ecological Society of America: grants, jobs, news 
[[email protected]] On Behalf Of Scott Crosbie [[email protected]]
Sent: Wednesday, February 10, 2010 11:47 AM
To: [email protected]
Subject: Re: [ECOLOG-L] AIC, data-dredging, and inappropriate stats

Hi eco-loggers,

There seems to be some confusion on what AIC and Goodness-of-fit (GOF) 
statistics are for.  I am NOT a statistician, but my understanding of the 
difference between AIC and GOF statistics is as follows:

AIC should be used to judge the relationship between precision and bias of a 
given model, and used in model selection only if relevant GOF statistics do not 
fail to reject the null hypothesis (that what is ‘expected’ by the model does 
not significantly depart from what is ‘observed’ in the data).  Sometimes AIC 
is referred to as ‘fit’ but model ‘fitness’ should first be quantified by GOF 
statistics, some of which have different assumptions of normality and/or focus 
on certain areas of the model and data to quantify fitness (see differences 
between chi-squared, Kolmogorov-Smirnov, Cramér–von-Mises, etc.).  Which GOF 
statistics to focus on will depend partly on what assumptions you make in the 
process of data gathering and modeling.

Scott Crosbie

--- On Wed, 2/10/10, Michael Cooperman <[email protected]> wrote:


From: Michael Cooperman <[email protected]>
Subject: Re: [ECOLOG-L] AIC, data-dredging, and inappropriate stats - correction
To: [email protected]
Date: Wednesday, February 10, 2010, 6:14 AM


Hello fellow Ecologers-
    Yesterday I offered a response to the original posting on issues 
surrounding use of AIC. Sadly, I failed to proof what I wrote and thereby 
submitted some wrong stuff -- I wrote, "if 2 different measures of fit (i.e., 
delta AIC value and r^2) support different conclusions...." -- Of course, delta 
AIC is not a measure of goodness of fit, it is a measure of the  "quality" 
(i.e., information loss) of a given model in comparison to other tested models. 
Hence, in my response to point 3 of the original post, which read:

3. Use of other 'fit' statistics along with the model-selection approach. I 
often see people reporting other statistics (e.g. p-vals, r-squared) in 
combination with the AIC scores. My statistician friend says that this is 
totally inappropriate, and uninformative.

My response should have been....since delta AIC and r2 measure different 
things, I think it can be appropriate to report them together; not as equal 
measures for model selection but as r2 informing on the relative value of the 
AIC solution (i.e., if AIC indicates model X is the best, but it has an 
exceptionally low r2 (assuming r2 is suitable to use as the relationship is 
linear) then even the best model identified by AIC is still pretty weak.

Sorry for any confusion.

Michael

Re: [ECOLOG-L] AIC, data-dredging, and inappropriate stats

Reply via email to