Wirt, as usual, makes clear and powerful arguments. I offer the
following responses in the spirit of discussion.
On Mar 5, 2006, at 7:04 PM, [EMAIL PROTECTED] wrote:
> The discussion I was having -- primarily with myself, I suspect --
> was in
> making the choice between the goodness-of-fit component of the AIC
> and its
> cost-of-complexity penalty. It is possible that you may have
> multiple minima that
> are closely valued. At that point, you have the opportunity to
> chose which
> attributes of your various models you deem most appropriate to your
> circumstance,
> quality of prediction or simplicity.
Absolutely. As I see it, the fundamental contribution of Akaike was
to define the penalty for complexity in the same 'units' as the
measure of goodness-of-fit, namely, probability. The advantage of
this goes beyond having a simple number to quantify a model, and
therefore a means to compare multiple models. It also allows a
quantified assessment of an ensemble of models, in terms of their
relative plausibility of being the 'correct' model. A sensible use of
AIC is not the picking of a single 'best' model (and ignoring the
rest), but rather a discussion of the meaning of the various
candidate models. Wirt is right that "not all models are equal in
their value to us". One can also use the plausible set to generate
importance values for various parameters ('important' ones tend to
occur in multiple models), and so on. In my limited experience, this
is where most insight may be gained.
In terms of the arbitrariness of Akaike's '2k' term, this has
inevitably been debated ever since it was proposed, and there are, of
course, many alternative penalty terms in the literature. I myself
like the approach of Bozdogan, who bases the penalty on the
complexity of either the covariance matrix or the inverse Fisher
information matrix of the model. But whether you think one approach
or another is better, all of them have the advantage I outlined in
the previous paragraph, of putting Occam's razor into a quantitative
form that one can then lean much from. In this respect, it is no
different from a physicist choosing a model that he knows is a bit
'too simple' for the problem at hand, but which has the advantage of
analytical tractability.
Finally, Wirt distinguishes between hypothesis testing and what is
commonly called 'data-mining', meaning the use of computers to search
for unanticipated patterns in large datasets. Wirt uses the example
of neural networks, but I think this is a bit of red herring. Most
are, indeed, 'black boxes' from which one cannot gain insight into
the methods of discrimination between datasets. For that reason, they
are rarely used in a traditional statistical sense. But typically
that is not the goal. As an example, my wife's website at http://
research.amnh.org/invertzoo/spida/ uses neural networks to identify
spider species from images. The goal of the system is not to learn
what distinguishes species -- that has already been done by skilled
systematists. The goal is to encapsulate their knowledge (expressed
as a published taxonomy) in a form that can accept new data (in the
form of images of specimens). The networks are tools for
disseminating knowledge, not for discovering it. This seems to me to
be a more typical, and perfectly valid, use of neural networks.
Data mining in and of itself is dangerous for the simple reason that,
with sufficient data, some unlikely patterns will occur by chance. A
computer has no way of knowing what is sensible and what isn't, but a
researcher does. It is a researcher's 'insight' that leads him or her
to formulate a priori hypotheses in the first place. That same
insight should lead to a careful examination of the results of any
data-mining type exercise. To be worthy of examination, a discovered
pattern should fulfil criteria other than simple existence. What is
disturbing is when a researcher is 'seduced' by the apparent
appearance of pattern into ignoring common sense and over-
interpreting the result. But this is not, perhaps, the fault of the
computer that found the pattern. There may be true, interesting
patterns hidden in data that only a computer can find. The trick is
to separate the real from the spurious.
None of this is to dispute Wirt's general concerns, which are good
ones, but rather to provide a counterdiscussion. I think AIC and its
relatives are useful tools when used correctly, and wouldn't want
them to be 'undersold' either!
Gareth
=====================================================================
Gareth J. Russell
Department of Mathematical Sciences (Division of Biological Sciences)
New Jersey Institute of Technology
Department of Biological Sciences
Rutgers University
Phones: (973) 642-4299 (NJIT)
(973) 353-1429 (Rutgers)
Fax: (973) 596-5591
E-mails: [EMAIL PROTECTED]
[EMAIL PROTECTED]
WWW: http://web.njit.edu/~russell
=====================================================================