I browsed through recent postings and hope
this delayed but self-contained message can clarify
a few things about probabilities and measures
and predictability etc.

What is the probability of an integer being, say,
a square? This question does not make sense without
a prior probability distribution on the integers.

This prior cannot be uniform. Try to find one!
Under _any_ distribution some integers must be
more likely than others.

Which prior is good?  Is there a `best' or
`universal' prior? Yes, there is. It assigns to
each integer n as much probability as any other
computable prior, save for a constant factor that
does not depend on n.  (A computable prior can be
encoded as a program that takes n as input and
outputs n's probability, e.g., a program that
implements Bernoulli's formula, etc.)

Given a set of priors, a universal prior is
essentially a weighted sum of all priors in the
set. For example, Solomonoff's famous weighted sum
of all enumerable priors will assign at least as
much probability to any square integer as any
other computable prior, save for a constant
machine-dependent factor that becomes less and
less relevant as the integers get larger and

Now let us talk about all computable universe
histories. Some are finite, some infinite. Each
has at least one program that computes it. Again
there is _no_ way of assigning equal probability
to all of them! Many are tempted to assume a
uniform distribution without thinking much about
it, but there is _no_ such thing as a uniform
distribution on all computable universes, or on
all axiomatic mathematical structures, or on
all logically possible worlds, etc!
(Side note: There only is a uniform _measure_ on
the finitely many possible history _beginnings_
of a given size, each standing for an uncountable
_set_ of possible futures. Probabilities
refer to single objects, measures to sets.)

It turns out that we can easily build universal
priors using Levin's important concept of self-
delimiting programs. Such programs may
occasionally execute the instruction "request new
input bit"; the bit is chosen randomly and will
remain fixed thereafter. Then the probability of
some universe history is the probability of guessing
a program for it. This probability is `universal'
as it does not depend much on the computer (whose
negligible influence can be buried in a constant
universe-independent factor). Some programs halt or
go in an infinite loop without ever requesting
additional input bits. Universes with at least one
such short self-delimiting program are more probable
than others.

To make predictions about some universe, say,
ours, we need a prior as well. For instance,
most people would predict that next Tuesday it
won't rain on the moon, although there are
computable universes where it does. The anthropic
principle is an _insufficient_ prior that does not
explain the absence of rain on the moon - it does
assign cumulative probability 1.0 to the set of all
universes where we exist, and 0.0 to all the other
universes, but humans could still exist if it did
rain on the moon occasionally. Still, many tend to
consider the probability of such universes as small,
which actually says something about their prior.

We do not know yet the true prior from which
our universe is sampled - Schroedinger's wave
function may be an approximation thereof. But it
turns out that if the true prior is computable
at all, then we can in principle already predict
near-optimally, using the universal prior instead:
Many really smart physicists do not know this
yet. Technical issues and limits of computable
universes are discussed in papers available at:
Even stronger predictions using a prior based
on the fastest programs (not the shortest):

-Juergen Schmidhuber

Reply via email to