The theory of inductive inference is Bayesian, of course. But Bayes' rule by itself does not yield Occam's razor.
Suppose x represents the history of our universe up until now. What is its most likely continuation y? Let us write xy for the entire history - the concatenation of x and y. Bayes just says: P(xy | x) = P(x | xy) P(xy) / N(x), where N(x) is a normalizing constant. So our conditional probability is proportional to the prior probability P(xy). Hence, according to Bayes, what you put in is what you get out. If your prior P(z) were high for simple z then you'd get Occam's razor: simple explanations preferred. But why should P favor simple z? Where does Occam's razor really come from? The essential work on this subject has been done in statistical learning theory, not in physics. Some have restricted P by making convenient Gaussian assumptions. Such restrictions yield specific variants of Occam's razor. But the most compelling approach is much broader than that. It just assumes that P is computable. That you can formally write it down. That there is a program that takes as input past observations and possible future observations, and computes conditional probabilities of the latter (Gaussian assumptions are a very special case thereof.) The computability assumption seems weak but is strong enough to yield a very general form of Occam's razor. It naturally leads to what is known as the universal prior, which dominates Gaussian and other computable priors. And Hutter's recent loss bounds show that it does not hurt much to predict according to the universal prior instead of the true but unknown distribution, as long as the latter is computable. I believe physicists and other inductive scientists really should become aware of this. It is essential to what they are doing. And much more formal and concrete than Popper's frequently cited but non-quantitative ideas on falsifiability. Juergen Schmidhuber http://www.idsia.ch/~juergen/

