Re: [ECOLOG-L] AIC, data-dredging, and inappropriate stats

Jordan Mayor Fri, 12 Feb 2010 14:12:15 -0800

Could you please clarify the following statement Sarah because, as written
it may confuse early practitioners of AIC based model selection:  "Finally,
unlike Rsq, the order in which variables enter the model effect the AIC
tremendously."


I have found that in my multiple regression models I get EXACTLY the same
AIC scores regardless of the order my variables are entered in my models, at
least to four significant digits, and the estimates to seven sig. digits.
 Also, surely you didn't mean the order by which the models themselves were
introduced to the statistics pacakge - as Burnham & Anderson's 1998 book
clearly states in section 2.11.2: "*Order Not Important In Computing AIC
Values*".
Perhaps I misunderstood your last sentence however.

Also, no one would argue that an "AIC of 300" has any information value in
and of itself (unlike an R2 of 0.89).  AIC's are in fact arbitrary and are
effected by sample size. To iterate this point Burnham & Andersen 2004 state
that they have seen them range from -600 to 340,000!  That is why everyone
instead uses delta i as a scaling measure to compare the "best" model to
others. A delta i of 2 vs. 20 is quite meaningful in that context.

Thanks,
Jordan

On Fri, Feb 12, 2010 at 2:39 PM, Fann, Sarah Lynn <[email protected]> wrote:

> Dear Ecolog readers,
>
> As Brian and others have pointed out, I made a poor choice of words when I
> used the phrase "future changes". Rsq is powerful for predicting responses
> within your range of data, but is completely invalid for predicting outside
> the observed range of data. For example, you have growth data for fishes
> ages 2, 6, 7, and 10 - Rsq will help you choose a model that most accurately
> estimates the size data for age 3 fish, but it would be invalid to use the
> same model to predict the size of fish at age 11. I hope this clarifies what
> I meant by "predict".
>
> With regards to AIC, you still have the same "predictive" issues that you
> would with Rsq. Any measure of model appropriateness will be with respect to
> your current dataset. As a measure of the predictive quality of a model, I
> would argue that AIC is very innappropriate. Although it is true that
> minimizing AIC will help select the best variables to describe the dataset,
> without selecting copious amounts of variables, it doesn't describe how well
> the model generated from these parameters "fits" the data. I can't tell you
> what an AIC of 300 means with regards to the data, but I know that an Rsq
> value of .89 explains abour 89% of the variance in a model. Similarly, a
> modle with an AIC of 300 might have an Rsq of .40 or an Rsq of .90 = it
> comes down to how the variables AIC are being used to make a
> model/predicitive equation. Finally,  unlike Rsq, the order in which
> variables enter the model effect the AIC tremendously.
>
> I hope this clarifies my earlier comments.
>
> Thanks,
>
> Sarah Fann
>
> "Education is what survives when what has been learnt has been forgotten."
>
> -   Fortune cookie
> ________________________________________
> From: Ecological Society of America: grants, jobs, news [
> [email protected]] On Behalf Of Brian R. Mitchell [
> [email protected]]
> Sent: Thursday, February 11, 2010 10:03 PM
> To: [email protected]
> Subject: Re: [ECOLOG-L] AIC, data-dredging, and inappropriate stats
>
> Hello ecolog,
>
> I disagree with the suggestion that maximizing R2 is a good way to
> predict future changes to a system... maximizing R2 may produce a
> perfect fit to your current data set, but you are fitting to the noise
> as well as the signal, and such a model will likely perform poorly with
> new data.  I think that if you want to have predictive power, you should
> probably still use a parsimonious approach like AIC, since this will
> tend to reject covariates that only have a small impact on the model's
> predictive power.
>
> Brian Mitchell
> > Date:    Wed, 10 Feb 2010 16:36:18 -0500
> > From:    "Fann, Sarah Lynn" <[email protected]>
> > Subject: Re: AIC, data-dredging, and inappropriate stats
> >
> > Dear ecology,
> >
> > AIC =  model deviance + 2*(# of parameters).
> >
> > In essence, AIC is calculated so that a model that "best" balances
> between decreasing the deviance of the model from the data (we want this)
> and keeping a model simple and/or relevant. The deviance will be small if
> the covariates (explanatory variables) are "good" or if we have a ton of
> lousy covariates. Thus AIC penalizes excessive covariates by adding 2*#
> parameters (i.e. your Betas which are estimated for each covariate and
> covariate interaction).
> >
> > Whether or not to use AIC, Rsq, or both comes down to the model design,
> and the results you are after. Do you want to explain the current state of a
> system and show which covariates are important? Minimize AIC. Do you want to
> predict future changes in the system? Maximize R2.
> >
> > This is my view from a Statistics perspective since I haven't studied
> model selection in a biological setting.
> >
> > Thank you very much,
> >
> > Sarah Fann
>
>

Re: [ECOLOG-L] AIC, data-dredging, and inappropriate stats

Reply via email to