**Re: colt**

Hi Juergen,

I don't know very well Solomonoff's work unfortunately.

But I know rather well the work by Putnam, Case and Smith,
Royer,

Blum etc. in limiting inference identification theories, which
are

part of colt too. (colt = computational learning theory =
theoretical

artificial intelligence). A nice and rare book exists on the
subject:

S.Jain, D.Osherson, J.S.Royer, A.Sharma, 1999, Systems that
Learn,

MIT Press.

Those works rely heavily on Kleene second recursion
theorem.

In the long version of my thesis, I have used such colt-results
to

motivate the "consciousness theory" on which the
reversal psycho/physics

is based. (Consciousness is seen as the result of instinctive
anticipation

of self-consistency). Unfortunately or fortunately (I don't
know), the

general use of the second recursion theorem by Solovay in its
proof of

the arithmetical completeness of the logic G and G* allowed me
to

bypasse explicit use of both Kleene's theorem and colt
results.

But I agree with you of the general importance of
theoretical

approaches in Inductive Inference theories.

For the sake of others let me just define here what is, in
Blum

Case and Smith approaches, an inductive inference machine (IIM).
(Some

people tend to think it is impossible to define such a
concept!).

First let us define what it means to say a machine
identifies

a phenomena. Let us (with comp) define a phenomenon by a
total

(everywhere defined) computable function.

We will say that a machine identify a phenomenon f, if when

we present, one at a time but in *any* order, the f input-output,
to

the machine M, M output (godel number code of) programs, and
eventually

converges on a program for f. (That is the machines
eventually

always output the same program).

Apparently this is a rather stupid definition. Indeed *all*
total

computable function are identifiable by a machine. For example,
take

the factorial. The IIM which *always* output a code for
factorial

identify trivially the factorial!

The interesting notion is the identification of a *class* of
programs

or functions. What we have just seen is that any singleton class,
like

{factorial}, {fibonacci}, etc. are identifiable.

Is the set {factorial, fibonacci} identifiable? Of course: by
dovetailing!

The machine outputs programs for factorial and fibonacci,
computes

them both by dovetailing, and stop to output one of them after
the

first discrepancy.

For that reason, as Blum showed, any Recursively Enumerable (RE)
class

of total function is easily identifiable. The class being RE, you
can

build a dovetailer running their executions and tested the
member

of the class. This is really a (non practical) dovetailing on
the

search space.

Now any reader of [Diagonalisation 1:
http://www.escribe.com/science/theory/m3079.html

Diagonalisation 1 (answers)
http://www.escribe.com/science/theory/m3344.html ]

know that the set R of all code of total computable functions is
not

RE. So the natural question is: does it exist a machine capable
of

identifying any total computable function? Putnam Showed that
such a

universal learning machine does not exist.

Case and Smith have then looked if a weakening of the
identification

criterion can help. Suppose we make it possible to the machine to
output

a program computing the code of the phenomenon f, but allowing
that program

to make 0 or 1 mistake. Well if the "or" is non
constructive, this works.

"EX" is the name of the collection of set of functions
identifiable by a

machine, and EX1 is the corresponding name for the weakened
criterion.

EXi: the same, where we allow 0 or 1 or 2 or ... or i errors
(non-constructive

or: with constructive "or" the errors can always be
recovered!) in the output

of the IIM.

EX*: the same but we allow a finite but undeterminate, number of
errors.

Case and Smith showed that EX is strictly include in EX1, which
is strictly

include in EX2, etc. The union of all EXi is itself strictly
include in EX*.

Alas! R is still not in EX*.

But look! Let us allow the IIM to produce an infinite numbers
(still one at

a time) of programs, and let us weakening the identification
criterion by

just saying the IIM identify f, if, after some time it produces
programs

(different one) computing f. (This is not the same as producing
always

the same programs). Let BC be the collection of set of
functions

identifiable by a machine in this new way. The crazy result is
that

EX* is included in BC. And, with obvious notation, Case and Smith
have

shown that BC is strictly include in BC1, itself st. incl. in
BC2, etc.

Here also the union of all BCi is st. include in BC*.

And (the cake's cherry): R is included in BC*. In principle,
there is

a universal learning machine. But this result is highly
theoretical, full

of necessary non constructive existence ....

Still very enlightening for showing how cleverness (defined by
the

bigness of identifiable classes of set of functions) is
subtle.

There is also the NON-UNION theorem of BLUM and BLUM. In a
nutshell,

the theorem say that there is something is *very* more (even
non

computably more) clever than a anticipating machine! What?
Answer:

two machines! And then 3 machines, 4 machines, etc...

The reason why I do not need to elaborate on this learning
machine

theory, is that I need only the fact that G* is anticipable by
the G

machine, and this follows trivially from the decidability of G*,
which

follows from Solovay's work.

But I agree with Juergen. There is plenty of beautiful and

entertaining results in colt. There are a lot of
identification

criteria the EX, BC like above, but also PEX (Popperian id),
OCCAM sort

of identification, etc.

Bruno

------------------------------

Original message by Juergen

Physics is an inductive science. You try to find a law that matches

the data, and hope that it generalizes on unseen data.

When asked about the nature of their inductive business, most

physicists refer to Popper's ideas from the 1930s (falsifiability

etc), and sometimes to Occam's razor: simple explanations are better

explanations.

Most physicists, however, are not even aware of the fact that there

is a formal basis for their inductive science, provided by the field

of computational learning theory (COLT), in particular, the theory of

universal induction.

The contents of the following COLT paper may be old news to some

on this list.

------------------------------------------------------------------

The Speed Prior: a new simplicity measure yielding near-optimal

computable predictions (Juergen Schmidhuber, IDSIA)

In J. Kivinen and R. H. Sloan, eds, Proc. 15th Annual Conf. on

Computational Learning Theory (COLT), 216-228, Springer, 2002;

based on section 6 of http://arXiv.org/abs/quant-ph/0011122 (2000)

http://www.idsia.ch/~juergen/speedprior.html

ftp://ftp.idsia.ch/pub/juergen/colt.ps

Solomonoff's optimal but noncomputable method for inductive

inference assumes that observation sequences x are drawn from an

recursive prior distribution mu(x). Instead of using the unknown

mu(x) he predicts using the celebrated universal enumerable prior

M(x) which for all x exceeds any recursive mu(x), save for a

constant factor independent of x. The simplicity measure M(x)

naturally implements "Occam's razor" and is closely related to the

Kolmogorov complexity of x. However, M assigns high probability to

certain data x that are extremely hard to compute. This does not match

our intuitive notion of simplicity. Here we suggest a more plausible

measure derived from the fastest way of computing data. In absence

of contrarian evidence, we assume that the physical world is generated

by a computational process, and that any possibly infinite sequence of

observations is therefore computable in the limit (this assumption is

more radical and stronger than Solomonoff's). Then we replace M by the

novel Speed Prior S, under which the cumulative a priori probability

of all data whose computation through an optimal algorithm requires

more than O(n) resources is 1/n. We show that the Speed Prior allows

for deriving a computable strategy for optimal prediction of future

y, given past x. Then we consider the case that the data actually

stem from a nonoptimal, unknown computational process, and use

Hutter's recent results to derive excellent expected loss bounds for

S-based inductive inference. Assuming our own universe is sampled

from S, we predict: it won't get many times older than it is now;

large scale quantum computation won't work well; beta decay is not

random but due to some fast pseudo-random generator which we should

try to discover.

Juergen Schmidhuber http://www.idsia.ch/~juergen