Information-Theoretic Approaches to Empirical Science
Courses can be scheduled late summer or fall.
Instructor: David R. Anderson
These courses present a new science paradigm based on Information Theory.
Kullback-Leibler information is the basis for model selection leading to
Akaikes Information Criterion (AIC). The course deals with science
philosophy, as much as data analysis and model selection. The focus is on
quantitative evidence for multiple science hypotheses. This general
approach includes ranking the science hypotheses; examination of the
probability of hypothesis j, given the data; and evidence ratios. Once
these concepts have been presented, the discussion shifts to making formal
inference from all the hypotheses and their models (multimodel inference).
Additional details can be viewed at
www.informationtheoryworkshop.com
Key Outcomes: Attendees will have a good understanding of these new
approaches and be able to perform analyses with their own data. The
computations required are quite simple once the parameter estimates have
been obtained for each model.
Target Audience: Graduate students, post-docs, faculty, and research people
in various agencies and institutes. People involved in research and science
where their work involves hypothesizing and modelling and their inferences
are model based will gain from this material.
Background Required: Attendees should have a decent background in
statistical principles and modelling (this is NOT a modelling course). The
course focuses on science, science philosophy, information and evidence.
The amount of mathematics or statistics presented in the course is
relatively meager; however, without a good understanding of linear and
nonlinear regression, least squares and maximum likelihood estimation, one
will struggle to understand some of the material to be presented.
Why Take This Course? A substantial paradigm shift is occurring in our
science and resource management. The past century relied on null hypothesis
testing, asymptotic distributions of the test statistic, P-values and a
ruling concerning significant or not significant. Under this analysis
paradigm a test statistic (T) is computed from the data. The P-value is the
focus of the analysis and is the Prob{T or more extreme, given the null
hypothesis]. With this definition in mind, we can abbreviate slightly.
Prob(X|Ho), where it is understood that X represents the data or more
extreme (unobserved) data.
The null hypothesis (Ho) takes center stage but is often trivial or even
silly. The alternative hypothesis (HA) is not the subject of the test;
support for the alternative occurs only if the P-value (for the null
hypothesis) is low, (often <0.05). Support for the alternative hypothesis
comes by default and only when the Prob{data|Ho} is low.
The proper interpretation of the P-value is quite strained: this might
explain why so many people erroneously pretend it means something quite
different (i.e., the probability that the null hypothesis is true). This is
not what is meant by a P-value.
These traditional methods are being replaced by information-theoretic
methods (and to a lesser extent, at least at this time, by a variety of
Bayesian methods). These approaches focus on an a priori set of plausible
science hypotheses
H1, H2,
, HR .
Evidence for or against members of this set of multiple working hypotheses
consists of a (1) the likelihood of each hypothesis, given the data, L(Hj|X)
or (2) a set of probabilities, Prob{H1, H2,
,HR, given the data} or
Prob(Hj|X}. These likelihoods and probabilities are direct evidence, where
evidence = information = -entropy.
Simple evidence ratios allow a measure of the formal strength of evidence
for any two science hypotheses. Note the radical difference in the
probability statements (above) stemming from either a P-value or the
probability of hypothesis j. Statistical inference should be about models
and parameters, conditional on the data, however, P-values are probability
statements about the data, conditional on the null hypothesis.
These new approaches (including Bayesian methods) allow statistical
inference to be based on all (or some) of the models in the a priori set,
leading to a robust class of methods termed multimodel inference. That
is, the inference is based on all the models in the set. Alternative
science hypotheses take center stage in these approaches and will require
much more attention than in the past century (where one started with an
alternative and the null was merely nothing or the naïve position: thus,
little science thinking was called for).
The set of science hypotheses evolves through time as implausible
hypotheses are eventually dropped from consideration, new hypotheses are
added, and existing hypotheses are further refined. Rapid progress in the
theoretical or applied sciences can be realized as this set evolves, based
on careful inferences from new data. This is an exciting time to be in
science or science-based management. There are important philosophies
involved here: these approaches go well beyond methods for just data
analysis.
The course will make use of the textbook,
Anderson, R. D. 2008. Model based evidence in the life sciences:
a primer on evidence. Springer, New York, NY. 184pp.
This book is included in the registration fell.
If you are interested in hosting a course at your location, please contact
me.
David R. Anderson
August 13, 2013
[email protected]