Wirt, as usual, makes clear and powerful arguments. I offer the  
following responses in the spirit of discussion.

On Mar 5, 2006, at 7:04 PM, [EMAIL PROTECTED] wrote:

> The discussion I was having -- primarily with myself, I suspect --  
> was in
> making the choice between the goodness-of-fit component of the AIC  
> and its
> cost-of-complexity penalty. It is possible that you may have  
> multiple minima that
> are closely valued. At that point, you have the opportunity to  
> chose which
> attributes of your various models you deem most appropriate to your  
> circumstance,
> quality of prediction or simplicity.

Absolutely. As I see it, the fundamental contribution of Akaike was  
to define the penalty for complexity in the same 'units' as the  
measure of goodness-of-fit, namely, probability. The advantage of  
this goes beyond having a simple number to quantify a model, and  
therefore a means to compare multiple models. It also allows a  
quantified assessment of an ensemble of models, in terms of their  
relative plausibility of being the 'correct' model. A sensible use of  
AIC is not the picking of a single 'best' model (and ignoring the  
rest), but rather a discussion of the meaning of the various  
candidate models. Wirt is right that "not all models are equal in  
their value to us". One can also use the plausible set to generate  
importance values for various parameters ('important' ones tend to  
occur in multiple models), and so on. In my limited experience, this  
is where most insight may be gained.

In terms of the arbitrariness of Akaike's '2k' term, this has  
inevitably been debated ever since it was proposed, and there are, of  
course, many alternative penalty terms in the literature. I myself  
like the  approach of Bozdogan, who bases the penalty on the  
complexity of either the covariance matrix or the inverse Fisher  
information matrix of the model. But whether you think one approach  
or another is better, all of them have the advantage I outlined in  
the previous paragraph, of putting Occam's razor into a quantitative  
form that one can then lean much from. In this respect, it is no  
different from a physicist choosing a model that he knows is a bit  
'too simple' for the problem at hand, but which has the advantage of  
analytical tractability.

Finally, Wirt distinguishes between hypothesis testing and what is  
commonly called 'data-mining', meaning the use of computers to search  
for unanticipated patterns in large datasets. Wirt uses the example  
of neural networks, but I think this is a bit of red herring. Most  
are, indeed, 'black boxes' from which one cannot gain insight into  
the methods of discrimination between datasets. For that reason, they  
are rarely used in a traditional statistical sense. But typically  
that is not the goal. As an example, my wife's website at http:// 
research.amnh.org/invertzoo/spida/ uses neural networks to identify  
spider species from images. The goal of the system is not to learn  
what distinguishes species -- that has already been done by skilled  
systematists. The goal is to encapsulate their knowledge (expressed  
as a published taxonomy) in a form that can accept new data (in the  
form of images of specimens). The networks are tools for  
disseminating knowledge, not for discovering it. This seems to me to  
be a more typical, and perfectly valid, use of neural networks.

Data mining in and of itself is dangerous for the simple reason that,  
with sufficient data, some unlikely patterns will occur by chance. A  
computer has no way of knowing what is sensible and what isn't, but a  
researcher does. It is a researcher's 'insight' that leads him or her  
to formulate a priori hypotheses in the first place. That same  
insight should lead to a careful examination of the results of any  
data-mining type exercise. To be worthy of examination, a discovered  
pattern should fulfil criteria other than simple existence. What is  
disturbing is when a researcher is 'seduced' by the apparent  
appearance of pattern into ignoring common sense and over- 
interpreting the result. But this is not, perhaps, the fault of the  
computer that found the pattern. There may be true, interesting  
patterns hidden in data that only a computer can find. The trick is  
to separate the real from the spurious.

None of this is to dispute Wirt's general concerns, which are good  
ones, but rather to provide a counterdiscussion. I think AIC and its  
relatives are useful tools when used correctly, and wouldn't want  
them to be 'undersold' either!

Gareth

=====================================================================
Gareth J. Russell

Department of Mathematical Sciences (Division of Biological Sciences)
New Jersey Institute of Technology

Department of Biological Sciences
Rutgers University

  Phones: (973) 642-4299 (NJIT)
          (973) 353-1429 (Rutgers)
     Fax: (973) 596-5591

E-mails: [EMAIL PROTECTED]
          [EMAIL PROTECTED]

     WWW: http://web.njit.edu/~russell
=====================================================================

Reply via email to