Re: colt

Bruno Marchal Tue, 13 Aug 2002 05:32:44 -0700

Title: Re: colt

Hi Juergen,

I don't know very well Solomonoff's work unfortunately.

But I know rather well the work by Putnam, Case and Smith, Royer,

Blum etc. in limiting inference identification theories, which are

part of colt too. (colt = computational learning theory = theoretical

artificial intelligence). A nice and rare book exists on the subject:

S.Jain, D.Osherson, J.S.Royer, A.Sharma, 1999, Systems that Learn,

MIT Press.

Those works rely heavily on Kleene second recursion theorem.

In the long version of my thesis, I have used such colt-results to

motivate the "consciousness theory" on which the reversal psycho/physics

is based. (Consciousness is seen as the result of instinctive anticipation

of self-consistency). Unfortunately or fortunately (I don't know), the

general use of the second recursion theorem by Solovay in its proof of

the arithmetical completeness of the logic G and G* allowed me to

bypasse explicit use of both Kleene's theorem and colt results.

But I agree with you of the general importance of theoretical

approaches in Inductive Inference theories.

For the sake of others let me just define here what is, in Blum

Case and Smith approaches, an inductive inference machine (IIM). (Some

people tend to think it is impossible to define such a concept!).

First let us define what it means to say a machine identifies

a phenomena. Let us (with comp) define a phenomenon by a total

(everywhere defined) computable function.

We will say that a machine identify a phenomenon f, if when

we present, one at a time but in *any* order, the f input-output, to

the machine M, M output (godel number code of) programs, and eventually

converges on a program for f. (That is the machines eventually

always output the same program).

Apparently this is a rather stupid definition. Indeed *all* total

computable function are identifiable by a machine. For example, take

the factorial. The IIM which *always* output a code for factorial

identify trivially the factorial!

The interesting notion is the identification of a *class* of programs

or functions. What we have just seen is that any singleton class, like

{factorial}, {fibonacci}, etc. are identifiable.

Is the set {factorial, fibonacci} identifiable? Of course: by dovetailing!

The machine outputs programs for factorial and fibonacci, computes

them both by dovetailing, and stop to output one of them after the

first discrepancy.

For that reason, as Blum showed, any Recursively Enumerable (RE) class

of total function is easily identifiable. The class being RE, you can

build a dovetailer running their executions and tested the member

of the class. This is really a (non practical) dovetailing on the

search space.

Now any reader of [Diagonalisation 1: http://www.escribe.com/science/theory/m3079.html

Diagonalisation 1 (answers) http://www.escribe.com/science/theory/m3344.html ]

know that the set R of all code of total computable functions is not

RE. So the natural question is: does it exist a machine capable of

identifying any total computable function? Putnam Showed that such a

universal learning machine does not exist.

Case and Smith have then looked if a weakening of the identification

criterion can help. Suppose we make it possible to the machine to output

a program computing the code of the phenomenon f, but allowing that program

to make 0 or 1 mistake. Well if the "or" is non constructive, this works.

"EX" is the name of the collection of set of functions identifiable by a

machine, and EX1 is the corresponding name for the weakened criterion.

EXi: the same, where we allow 0 or 1 or 2 or ... or i errors (non-constructive

or: with constructive "or" the errors can always be recovered!) in the output

of the IIM.

EX*: the same but we allow a finite but undeterminate, number of errors.

Case and Smith showed that EX is strictly include in EX1, which is strictly

include in EX2, etc. The union of all EXi is itself strictly include in EX*.

Alas! R is still not in EX*.

But look! Let us allow the IIM to produce an infinite numbers (still one at

a time) of programs, and let us weakening the identification criterion by

just saying the IIM identify f, if, after some time it produces programs

(different one) computing f. (This is not the same as producing always

the same programs). Let BC be the collection of set of functions

identifiable by a machine in this new way. The crazy result is that

EX* is included in BC. And, with obvious notation, Case and Smith have

shown that BC is strictly include in BC1, itself st. incl. in BC2, etc.

Here also the union of all BCi is st. include in BC*.

And (the cake's cherry): R is included in BC*. In principle, there is

a universal learning machine. But this result is highly theoretical, full

of necessary non constructive existence ....

Still very enlightening for showing how cleverness (defined by the

bigness of identifiable classes of set of functions) is subtle.

There is also the NON-UNION theorem of BLUM and BLUM. In a nutshell,

the theorem say that there is something is *very* more (even non

computably more) clever than a anticipating machine! What? Answer:

two machines! And then 3 machines, 4 machines, etc...

The reason why I do not need to elaborate on this learning machine

theory, is that I need only the fact that G* is anticipable by the G

machine, and this follows trivially from the decidability of G*, which

follows from Solovay's work.

But I agree with Juergen. There is plenty of beautiful and

entertaining results in colt. There are a lot of identification

criteria the EX, BC like above, but also PEX (Popperian id), OCCAM sort

of identification, etc.

Bruno

------------------------------

Original message by Juergen

Physics is an inductive science. You try to find a law that matches
the data, and hope that it generalizes on unseen data.

When asked about the nature of their inductive business, most
physicists refer to Popper's ideas from the 1930s (falsifiability
etc), and sometimes to Occam's razor: simple explanations are better
explanations.

Most physicists, however, are not even aware of the fact that there
is a formal basis for their inductive science, provided by the field
of computational learning theory (COLT), in particular, the theory of
universal induction.

The contents of the following COLT paper may be old news to some
on this list.

------------------------------------------------------------------

The Speed Prior: a new simplicity measure yielding near-optimal
computable predictions (Juergen Schmidhuber, IDSIA)

In J. Kivinen and R. H. Sloan, eds, Proc. 15th Annual Conf. on
Computational Learning Theory (COLT), 216-228, Springer, 2002;

based on section 6 of http://arXiv.org/abs/quant-ph/0011122 (2000)
http://www.idsia.ch/~juergen/speedprior.html
ftp://ftp.idsia.ch/pub/juergen/colt.ps

Solomonoff's optimal but noncomputable method for inductive
inference assumes that observation sequences x are drawn from an
recursive prior distribution mu(x). Instead of using the unknown
mu(x) he predicts using the celebrated universal enumerable prior
M(x) which for all x exceeds any recursive mu(x), save for a
constant factor independent of x. The simplicity measure M(x)
naturally implements "Occam's razor" and is closely related to the
Kolmogorov complexity of x. However, M assigns high probability to
certain data x that are extremely hard to compute. This does not match
our intuitive notion of simplicity. Here we suggest a more plausible
measure derived from the fastest way of computing data. In absence
of contrarian evidence, we assume that the physical world is generated
by a computational process, and that any possibly infinite sequence of
observations is therefore computable in the limit (this assumption is
more radical and stronger than Solomonoff's). Then we replace M by the
novel Speed Prior S, under which the cumulative a priori probability
of all data whose computation through an optimal algorithm requires
more than O(n) resources is 1/n. We show that the Speed Prior allows
for deriving a computable strategy for optimal prediction of future

y, given past x. Then we consider the case that the data actually
stem from a nonoptimal, unknown computational process, and use
Hutter's recent results to derive excellent expected loss bounds for
S-based inductive inference. Assuming our own universe is sampled
from S, we predict: it won't get many times older than it is now;
large scale quantum computation won't work well; beta decay is not
random but due to some fast pseudo-random generator which we should
try to discover.

Juergen Schmidhuber http://www.idsia.ch/~juergen

Re: colt

Reply via email to