RE: [agi] Breaking AIXI-tl

Ben Goertzel Fri, 14 Feb 2003 18:07:43 -0800

Hi,


> You appear to be thinking of AIXI-tl as a fuzzy little harmless baby being
> confronted with some harsh trial.

Once again, your ability to see into my mind proves extremely flawed ;-)

You're right that my statement "AIXItl is slow at learning" was ill-said,
though.  It is very inefficient at learning in the sense that it takes a
huge number of computation steps to decide each action it takes.  However,
in your PD scenario you're assuming that it has a fast enough processor to
do all this thinking inbetween each step of the iterated PD, in which case,
yeah, it has to be doing very very fast operations.  AIXItl is slow at
learning if you count slowness in terms of computation steps, but that's not
the way your example wants us to lok at things...


>  > The question is whether after enough trials AIXI-tl figures out it's
>  > playing some entity similar to itself and learns how to act
>  > accordingly....  If so, then it's doing what AIXI-tl is supposed to do.
>  >
> AIXI-tl *cannot* figure this out because its control process is not
> capable of recognizing tl-computable transforms of its own policies and
> strategic abilities, *only* tl-computable transforms of its own direct
> actions.  Yes, it simulates entities who know this; it also simulates
> every possible other kind of tl-bounded entity.  The question is whether
> that internal knowledge appears as an advantage recognized by the control
> process and given AIXI-tl's formal definition, it does not appear
> to do so.

I don't understand how you're deriving the conclusion in your final
sentence.

How do you know the circumstances in which AIXItl would be led to adopt
operating programs involving modeling its own policies and strategic
abilities?

You may well be right that PD2 is not such a circumstance, but that doesn't
mean there are no such circumstances, or that such circumstances wouldn't be
common in the hypothetical life of a real embodies AIXItl

>  > A human can also learn to solve vision recognition problems faster than
>  >  AIXI-tl, because we're wired for it (as we're wired for social
>  > gameplaying), whereas AIXI-tl has to learn
>
> AIXI-tl learns vision *instantly*.  The Kolmogorov complexity of a visual
> field is much less than its raw string, and the compact representation can
> be computed by a tl-bounded process.  It develops a visual cortex on the
> same round it sees its first color picture.

Yes, but that "visual cortex" would not be useful for anything.  It would
take some time for an embodied AIXItl to figure out how to recognize visual
patterns in a way that was useful to it in coordinating its actions.  Unless
it had a priori knowledge to guide it, this would be a substantial process
of trial and error learning.

>  >> Humans can recognize a much stronger degree of similarity in human
>  >> Other Minds than AIXI-tl's internal processes are capable of
>  >> recognizing in any other AIXI-tl.
>  >
>  > I don't believe that is true.
>
> Mentally simulate the abstract specification of AIXI-tl instead of using
> your intuitions about the behavior of a generic reinforcement process.

Eliezer, I don't know what a "generic reinforcement process" is.  Of course
AIXItl is very different from an ordinary reinforcement learning system.

>  > OK... here's where the fact that you have a tabula rasa AIXI-tl in a
>  > very limiting environment comes in.
>  >
>  > In a richer environment, I don't see why AIXI-tl, after a long enough
>  > time, couldn't learn an operating program that implicitly embodied an
>  > abstraction over its own internal state.
>
> Because it is physically or computationally impossible for a tl-bounded
> program to access or internally reproduce the previously computed
> policies
> or t2^l strategic ability of AIXI-tl.

Yes, but why can't it learn patterns that let it approximately predict the
strategies of AIXI-tl?

>  > In an environment consisting solely of PD2, it may be that AIXI-tl will
>  > never have the inspiration to learn this kind of operating program.
>  > (I'm not sure.)
>  >
>  > To me, this says mostly that PD2 is an inadequate environment for any
>  > learning system to use, to learn how to become a mind.  If it ain't
>  > good enough for AIXI-tl to use to learn how to become a mind, over a
>  > very long period of time, it probably isn't good for any AI system to
>  > use to learn how to become a mind.
>
> Marcus Hutter has formally proved your intuitions wrong.  In any
> situation
> that does *not* break the formalism, AIXI-tl learns to equal or
> outperform
> any other process, despite being a tabula rasa, no matter how
> rich or poor
> its environment.

No, Marcus Hutter did not prove the intuition I expressed there wrong.  You
seem not to have understood what I was saying.

AIXI-tl can equal or outperform any other process so long as it is given a
lot more computatonal resources than the other process.  But that was not
the statement I was making.

What I was saying was that ANY reinforcement learning system, if its only
life experience is playing Prisoners Dilemma against its clone, is going to
be unlikely to develop a generally intelligent mind.


> Measured in computing cycles, yes.  Measured in rounds of information
> required, no.  AIXI-tl is defined to run on a very VERY fast computer.
> Marcus Hutter has formally proved your intutions about the requirement of
> a rich environment or prior training to be wrong; I am trying to
> show that
> your intuitions about what AIXI-tl is capable of learning are wrong.

Again, you seem to have misunderstood my statements.

Hutter did not show that a rich environment is unnecessary for producing a
mind.

For instance, suppose you take an AIXI or AIXItl system and systematically
reward it every time it produces a "1" on its output screen, and punish it
every time it does something else.  Given this impoverished environment, the
AIXI/AIXItl system is not going to adopt an operating program that is
"mindlike" in any real sense.  It's going to adopt an operating program such
as "Print '1'".

The Prisoners Dilemma environment is not as impoverished as the "Reward only
printing 1" environment.  But it's still pretty impoverished, and hence
still pretty unlikely to lead to an AIXI/AIXItl system with a mindlike
operating program.

Hutter's theorem says that AIXItl will equal or outperform any other system
at getting rewarded, where rewards are determined by any computable
function, so long as AIXItl gets a lot more resources than the other system.
But if the reward function is too simplistic then AIXIt will never learn an
interesting operating program...
>
> But to follow either Hutter's argument or my own requires mentally
> reproducing more of the abstract properties of AIXI-tl, given its
> abstract
> specification, than your intuitions currently seem to be providing.  Do
> you have a non-intuitive mental simulation mode?

Eliezer, I understand Hutter's theorems, although I haven't read through all
his other papers carefully enough to understand the proofs.  As a
mathematician, yes, I am comfortable understanding and producing formal
reasoning even when it contradicts my intuition.

The arguments you're making are obviously less rigorously and formally
stated than the stuff in Hutter's paper, and so I need to use more intuition
to understand them.

-- Ben

-------
To unsubscribe, change your address, or temporarily deactivate your subscription, 
please go to http://v2.listbox.com/member/?[EMAIL PROTECTED]

RE: [agi] Breaking AIXI-tl

Reply via email to