Optimality of Universal Bayesian Sequence Prediction
for General Loss and Alphabet
JMLR 4(Nov):971-1000, 2003
Abstract
Various optimality properties of universal sequence predictors based on
Bayes-mixtures in general, and Solomonoff's prediction scheme in
particular, will be studied. The probability of observing xt at time t,
given past observations x1...xt-1 can be computed with the chain rule if
the true generating distribution of the sequences x1x2x3.... is known.
If is unknown, but known to belong to a countable or continuous class
one can base ones prediction on the Bayes-mixture defined as a
w-weighted sum or integral of distributions . The cumulative
expected loss of the Bayes-optimal universal prediction scheme based on
is shown to be close to the loss of the Bayes-optimal, but infeasible
prediction scheme based on . We show that the bounds are tight and that
no other predictor ca! n lead to significantly smaller bounds.
Furthermore, for various performance measures, we show Pareto-optimality
of and give an Occam's razor argument that the choice w 2-K() for
the weights is optimal, where K() is the length of the shortest program
describing . The results are applied to games of chance, defined as a
sequence of bets, observations, and rewards. The prediction schemes (and
bounds) are compared to the popular predictors based on expert advice.
Extensions to infinite alphabets, partial, delayed and probabilistic
prediction, classification, and more active systems are briefly
discussed.
----------------------------------------------------------------------------
This paper, and all previous papers in Volume 4 are available
electronically at http://www.marketingtops.com in PostScript and PDF formats. The
papers of Volumes 1, 2 and 3 are also available electronically from the
JMLR website, and in hardcopy from the MIT Press; please see
http://www.marketingtops.com for details.
for General Loss and Alphabet
JMLR 4(Nov):971-1000, 2003
Abstract
Various optimality properties of universal sequence predictors based on
Bayes-mixtures in general, and Solomonoff's prediction scheme in
particular, will be studied. The probability of observing xt at time t,
given past observations x1...xt-1 can be computed with the chain rule if
the true generating distribution of the sequences x1x2x3.... is known.
If is unknown, but known to belong to a countable or continuous class
one can base ones prediction on the Bayes-mixture defined as a
w-weighted sum or integral of distributions . The cumulative
expected loss of the Bayes-optimal universal prediction scheme based on
is shown to be close to the loss of the Bayes-optimal, but infeasible
prediction scheme based on . We show that the bounds are tight and that
no other predictor ca! n lead to significantly smaller bounds.
Furthermore, for various performance measures, we show Pareto-optimality
of and give an Occam's razor argument that the choice w 2-K() for
the weights is optimal, where K() is the length of the shortest program
describing . The results are applied to games of chance, defined as a
sequence of bets, observations, and rewards. The prediction schemes (and
bounds) are compared to the popular predictors based on expert advice.
Extensions to infinite alphabets, partial, delayed and probabilistic
prediction, classification, and more active systems are briefly
discussed.
----------------------------------------------------------------------------
This paper, and all previous papers in Volume 4 are available
electronically at http://www.marketingtops.com in PostScript and PDF formats. The
papers of Volumes 1, 2 and 3 are also available electronically from the
JMLR website, and in hardcopy from the MIT Press; please see
http://www.marketingtops.com for details.
Do you Yahoo!?
New Yahoo! Photos - easier uploading and sharing
