I browsed through recent postings and hope this delayed but self-contained message can clarify a few things about probabilities and measures and predictability etc.

What is the probability of an integer being, say, a square? This question does not make sense without a prior probability distribution on the integers.

This prior cannot be uniform. Try to find one! Under _any_ distribution some integers must be more likely than others.

Which prior is good? Is there a `best' or `universal' prior? Yes, there is. It assigns to each integer n as much probability as any other computable prior, save for a constant factor that does not depend on n. (A computable prior can be encoded as a program that takes n as input and outputs n's probability, e.g., a program that implements Bernoulli's formula, etc.)

Given a set of priors, a universal prior is essentially a weighted sum of all priors in the set. For example, Solomonoff's famous weighted sum of all enumerable priors will assign at least as much probability to any square integer as any other computable prior, save for a constant machine-dependent factor that becomes less and less relevant as the integers get larger and larger.

Now let us talk about all computable universe histories. Some are finite, some infinite. Each has at least one program that computes it. Again there is _no_ way of assigning equal probability to all of them! Many are tempted to assume a uniform distribution without thinking much about it, but there is _no_ such thing as a uniform distribution on all computable universes, or on all axiomatic mathematical structures, or on all logically possible worlds, etc! (Side note: There only is a uniform _measure_ on the finitely many possible history _beginnings_ of a given size, each standing for an uncountable _set_ of possible futures. Probabilities refer to single objects, measures to sets.)

It turns out that we can easily build universal priors using Levin's important concept of self- delimiting programs. Such programs may occasionally execute the instruction "request new input bit"; the bit is chosen randomly and will remain fixed thereafter. Then the probability of some universe history is the probability of guessing a program for it. This probability is `universal' as it does not depend much on the computer (whose negligible influence can be buried in a constant universe-independent factor). Some programs halt or go in an infinite loop without ever requesting additional input bits. Universes with at least one such short self-delimiting program are more probable than others.

To make predictions about some universe, say, ours, we need a prior as well. For instance, most people would predict that next Tuesday it won't rain on the moon, although there are computable universes where it does. The anthropic principle is an _insufficient_ prior that does not explain the absence of rain on the moon - it does assign cumulative probability 1.0 to the set of all universes where we exist, and 0.0 to all the other universes, but humans could still exist if it did rain on the moon occasionally. Still, many tend to consider the probability of such universes as small, which actually says something about their prior.

We do not know yet the true prior from which our universe is sampled - Schroedinger's wave function may be an approximation thereof. But it turns out that if the true prior is computable at all, then we can in principle already predict near-optimally, using the universal prior instead: http://www.idsia.ch/~juergen/unilearn.html Many really smart physicists do not know this yet. Technical issues and limits of computable universes are discussed in papers available at: http://www.idsia.ch/~juergen/computeruniverse.html Even stronger predictions using a prior based on the fastest programs (not the shortest): http://www.idsia.ch/~juergen/speedprior.html

`-Juergen Schmidhuber`