Hi Ed,

So is the real significance of the universal prior, not its probability
> value given in a given probability space (which seems relatively
> unimportant, provided is not one or close to zero), but rather the fact that
> it can model almost any kind of probability space?
>

It just takes a binary string as input.  If you can express your problem as
one in which
a binary string represents what has been observed so far, and a continuation
of this
string represents what happens next, then Solomonoff induction can deal with
it.
So you don't have to "pick the space".  You do however have to take your
problem
and represent it as binary data and feed it in, just as you do when you put
any kind
of data into a computer.

The power of the universal prior comes from the fact that it takes all
computable
distributions into account.  In a sense it contains all well defined
hypotheses about
what the structure in the string could be.  This is a point that is worth
contemplating
for awhile.  If there is any structure in there and this structure can be
described by
a program on a computer, even a probabilistic one, then it's already
factored into
the universal prior and the Solomonoff predictor is already taking it into
account.

How does the Kolmogorov complexity help deal with this problem?
>

The key thing that Kolmogorov complexity provides is that it assigns a
weighting to
each hypothesis in the universal prior that is inversely proportional to the
complexity
of the hypothesis.  This means that the Solomonoff predictor respects, in
some sense,
the principle of Occam's razor.  That is, a priori, simpler things are
considered more
likely than complex ones.

ED######> ??Shane??, what are the major ways programs are used in a
> Solomonoff machine?  Are they used for generating and matching patterns? Are
> they used for generating and creating context specific instantiations of
> behavioral patterns?
>
Keep in mind that Solomonoff induction is not computable.  It is not an
algorithm.
The role that programs play is that they are used to "construct" the
universal prior.
Once this is done, the Solomonoff predictor just takes the prior and
conditions on
the observed string so far to work out the distribution over the next bit.
That's all.


>Lukasz######> The programs are generally required to exactly match in AIXI
> (but not in AIXItl I think).
> ED######> ??Shane??, could you please give us an assist on this one? Is
> exact matching required?  And if so, is this something that could be
> loosened in a real machine?
>
Exact pattern matching is required in the sense that if a hypothesis says
that
something cannot happen, and it does, then that hypothesis is effectively
discarded.

A real machine might have to loosen this, and many other things.  Note that
nobody
I know is trying to build a real AGI machine based on Solomonoff's model.


Isn't there a large similarity between a Solomonoff machine that could learn
> a hierarchy of pattern representing programs and Jeff Hawking's hierarchical
> learning (as represented in the Serre paper).  One could consider the
> patterns at each level of the higherarchy as sub-routines.  The system is
> designed to increase its representational efficiency by having
> representational subroutines available for use by multiple different
> patterns at higher compositional levels.  To the extent that a MOSES-type
> evolutionary system could be set to work making such representations more
> compact, it would become clear how semi-Solomonoff machines could be made to
> work in the practical world.
>

In think the point is that if you can do really really good general sequence
prediction (via something
impractical like Solomonoff induction, or practical like the cortex) then
you're a long way towards
being able to build a pretty impressive AGI.  Some of Hutter's students are
interested in the latter.



> The def of Solomonoff induction on the web and even in Shane Legg's paper
> "Solomonoff induction" make it sound like it is merely Bayesian induction,
> using the picking of priors based on Kolmogorov complexity.
>
Yes, that's all it is.

But statements made by Shane and Lukasz appears to imply that a Solomonoff
> machine uses programming and programming size as a tool for pattern
> representation, generalization, learning, inference, and more.
>
All these programs are weighted into that universal prior.


> So I think (but I could well be wrong) I know what that means.
> Unfortunately I am a little fuzzy about whether NCD would take "what"
> information, "what-with-what" or binding information, or frequency
> information sufficiently into account to be an optimal measure of
> similarity.  Is this correct?
>
NCD is just a computable approximation.  The universal similarity metric (in
the Li and Vitanyi book
that I cited) gives the pure incomputable version.  The pure version
basically takes all effective
similarity metrics into account when working out how similar two things
are.  So if you have some
concept of similarity that you're interested in that can be programmed, it's
already factoring this in.

Cheers,
Shane

-----
This list is sponsored by AGIRI: http://www.agiri.org/email
To unsubscribe or change your options, please go to:
http://v2.listbox.com/member/?member_id=8660244&id_secret=63958284-e6bb79

Reply via email to