Re: Optimal Prediction

```
The theory of inductive inference is Bayesian, of course.
But Bayes' rule by itself does not yield Occam's razor.```
```
Suppose x represents the history of our universe up until now.
What is its most likely continuation y? Let us write xy for
the entire history - the concatenation of x and y. Bayes just
says: P(xy | x) = P(x | xy) P(xy) / N(x), where N(x) is
a normalizing constant. So our conditional probability
is proportional to the prior probability P(xy).

Hence, according to Bayes, what you put in is what you get
out.  If your prior P(z) were high for simple z then you'd
get Occam's razor: simple explanations preferred.

But why should P favor simple z?  Where does Occam's razor
really come from? The essential work on this subject has
been done in statistical learning theory, not in physics.

Some have restricted P by making convenient Gaussian
assumptions. Such restrictions yield specific variants
of Occam's razor.

But the most compelling approach is much broader than that.
It just assumes that P is computable. That you can formally
write it down. That there is a program that takes as input
past observations and possible future observations,
and computes conditional probabilities of the latter
(Gaussian assumptions are a very special case thereof.)

The computability assumption seems weak but is strong enough
to yield a very general form of Occam's razor. It naturally
leads to what is known as the universal prior, which dominates
Gaussian and other computable priors. And Hutter's
recent loss bounds show that it does not hurt much to predict
according to the universal prior instead of the true but
unknown distribution, as long as the latter is computable.

I believe physicists and other inductive scientists really
should become aware of this. It is essential to what they are
doing. And much more formal and concrete than Popper's
frequently cited but non-quantitative ideas on falsifiability.

Juergen Schmidhuber            http://www.idsia.ch/~juergen/

```