Hi Juergen,
I don't know very well Solomonoff's work unfortunately.
But I know rather well the work by Putnam, Case and Smith,
Royer,
Blum etc. in limiting inference identification theories, which
are
part of colt too. (colt = computational learning theory =
theoretical
artificial intelligence). A nice and rare book exists on the
subject:
S.Jain, D.Osherson, J.S.Royer, A.Sharma, 1999, Systems that
Learn,
MIT Press.
Those works rely heavily on Kleene second recursion
theorem.
In the long version of my thesis, I have used such colt-results
to
motivate the "consciousness theory" on which the
reversal psycho/physics
is based. (Consciousness is seen as the result of instinctive
anticipation
of self-consistency). Unfortunately or fortunately (I don't
know), the
general use of the second recursion theorem by Solovay in its
proof of
the arithmetical completeness of the logic G and G* allowed me
to
bypasse explicit use of both Kleene's theorem and colt
results.
But I agree with you of the general importance of
theoretical
approaches in Inductive Inference theories.
For the sake of others let me just define here what is, in
Blum
Case and Smith approaches, an inductive inference machine (IIM).
(Some
people tend to think it is impossible to define such a
concept!).
First let us define what it means to say a machine
identifies
a phenomena. Let us (with comp) define a phenomenon by a
total
(everywhere defined) computable function.
We will say that a machine identify a phenomenon f, if when
we present, one at a time but in *any* order, the f input-output,
to
the machine M, M output (godel number code of) programs, and
eventually
converges on a program for f. (That is the machines
eventually
always output the same program).
Apparently this is a rather stupid definition. Indeed *all*
total
computable function are identifiable by a machine. For example,
take
the factorial. The IIM which *always* output a code for
factorial
identify trivially the factorial!
The interesting notion is the identification of a *class* of
programs
or functions. What we have just seen is that any singleton class,
like
{factorial}, {fibonacci}, etc. are identifiable.
Is the set {factorial, fibonacci} identifiable? Of course: by
dovetailing!
The machine outputs programs for factorial and fibonacci,
computes
them both by dovetailing, and stop to output one of them after
the
first discrepancy.
For that reason, as Blum showed, any Recursively Enumerable (RE)
class
of total function is easily identifiable. The class being RE, you
can
build a dovetailer running their executions and tested the
member
of the class. This is really a (non practical) dovetailing on
the
search space.
Now any reader of [Diagonalisation 1:
http://www.escribe.com/science/theory/m3079.html
Diagonalisation 1 (answers)
http://www.escribe.com/science/theory/m3344.html ]
know that the set R of all code of total computable functions is
not
RE. So the natural question is: does it exist a machine capable
of
identifying any total computable function? Putnam Showed that
such a
universal learning machine does not exist.
Case and Smith have then looked if a weakening of the
identification
criterion can help. Suppose we make it possible to the machine to
output
a program computing the code of the phenomenon f, but allowing
that program
to make 0 or 1 mistake. Well if the "or" is non
constructive, this works.
"EX" is the name of the collection of set of functions
identifiable by a
machine, and EX1 is the corresponding name for the weakened
criterion.
EXi: the same, where we allow 0 or 1 or 2 or ... or i errors
(non-constructive
or: with constructive "or" the errors can always be
recovered!) in the output
of the IIM.
EX*: the same but we allow a finite but undeterminate, number of
errors.
Case and Smith showed that EX is strictly include in EX1, which
is strictly
include in EX2, etc. The union of all EXi is itself strictly
include in EX*.
Alas! R is still not in EX*.
But look! Let us allow the IIM to produce an infinite numbers
(still one at
a time) of programs, and let us weakening the identification
criterion by
just saying the IIM identify f, if, after some time it produces
programs
(different one) computing f. (This is not the same as producing
always
the same programs). Let BC be the collection of set of
functions
identifiable by a machine in this new way. The crazy result is
that
EX* is included in BC. And, with obvious notation, Case and Smith
have
shown that BC is strictly include in BC1, itself st. incl. in
BC2, etc.
Here also the union of all BCi is st. include in BC*.
And (the cake's cherry): R is included in BC*. In principle,
there is
a universal learning machine. But this result is highly
theoretical, full
of necessary non constructive existence ....
Still very enlightening for showing how cleverness (defined by
the
bigness of identifiable classes of set of functions) is
subtle.
There is also the NON-UNION theorem of BLUM and BLUM. In a
nutshell,
the theorem say that there is something is *very* more (even
non
computably more) clever than a anticipating machine! What?
Answer:
two machines! And then 3 machines, 4 machines, etc...
The reason why I do not need to elaborate on this learning
machine
theory, is that I need only the fact that G* is anticipable by
the G
machine, and this follows trivially from the decidability of G*,
which
follows from Solovay's work.
But I agree with Juergen. There is plenty of beautiful and
entertaining results in colt. There are a lot of
identification
criteria the EX, BC like above, but also PEX (Popperian id),
OCCAM sort
of identification, etc.
Bruno
------------------------------
Original message by Juergen
Physics is an inductive science. You try to find a law that matches
the data, and hope that it generalizes on unseen data.
When asked about the nature of their inductive business, most
physicists refer to Popper's ideas from the 1930s (falsifiability
etc), and sometimes to Occam's razor: simple explanations are better
explanations.
Most physicists, however, are not even aware of the fact that there
is a formal basis for their inductive science, provided by the field
of computational learning theory (COLT), in particular, the theory of
universal induction.
The contents of the following COLT paper may be old news to some
on this list.
------------------------------------------------------------------
The Speed Prior: a new simplicity measure yielding near-optimal
computable predictions (Juergen Schmidhuber, IDSIA)
In J. Kivinen and R. H. Sloan, eds, Proc. 15th Annual Conf. on
Computational Learning Theory (COLT), 216-228, Springer, 2002;
based on section 6 of http://arXiv.org/abs/quant-ph/0011122 (2000)
http://www.idsia.ch/~juergen/speedprior.html
ftp://ftp.idsia.ch/pub/juergen/colt.ps
Solomonoff's optimal but noncomputable method for inductive
inference assumes that observation sequences x are drawn from an
recursive prior distribution mu(x). Instead of using the unknown
mu(x) he predicts using the celebrated universal enumerable prior
M(x) which for all x exceeds any recursive mu(x), save for a
constant factor independent of x. The simplicity measure M(x)
naturally implements "Occam's razor" and is closely related to the
Kolmogorov complexity of x. However, M assigns high probability to
certain data x that are extremely hard to compute. This does not match
our intuitive notion of simplicity. Here we suggest a more plausible
measure derived from the fastest way of computing data. In absence
of contrarian evidence, we assume that the physical world is generated
by a computational process, and that any possibly infinite sequence of
observations is therefore computable in the limit (this assumption is
more radical and stronger than Solomonoff's). Then we replace M by the
novel Speed Prior S, under which the cumulative a priori probability
of all data whose computation through an optimal algorithm requires
more than O(n) resources is 1/n. We show that the Speed Prior allows
for deriving a computable strategy for optimal prediction of future
y, given past x. Then we consider the case that the data actually
stem from a nonoptimal, unknown computational process, and use
Hutter's recent results to derive excellent expected loss bounds for
S-based inductive inference. Assuming our own universe is sampled
from S, we predict: it won't get many times older than it is now;
large scale quantum computation won't work well; beta decay is not
random but due to some fast pseudo-random generator which we should
try to discover.
Juergen Schmidhuber http://www.idsia.ch/~juergen

