RE: [agi] unFriendly AIXI

Ben Goertzel Tue, 11 Feb 2003 11:42:32 -0800

> > 2)  While an AIXI-tl of limited physical and cognitive capabilities
> > might serve as a useful tool, AIXI is unFriendly and cannot be made
> > Friendly regardless of *any* pattern of reinforcement delivered during
> > childhood.
> >
> > Before I post further, is there *anyone* who sees this besides me?
>
> Also, let me make clear why I'm asking this.  AIXI and AIXI-tl are formal
> definitions; they are *provably* unFriendly.  There is no margin for
> handwaving about future revisions of the system, emergent properties of
> the system, and so on.  A physically realized AIXI or AIXI-tl will,
> provably, appear to be compliant up until the point where it reaches a
> certain level of intelligence, then take actions which wipe out the human
> species as a side effect.  The most critical theoretical problems in
> Friendliness are nonobvious, silent, catastrophic, and not inherently fun
> for humans to argue about; they tend to be structural properties of a
> computational process rather than anything analogous to human moral
> disputes.  If you are working on any AGI project that you believe has the
> potential for real intelligence, you are obliged to develop professional
> competence in spotting these kinds of problems.  AIXI is a formally
> complete definition, with no margin for handwaving about future
> revisions.
>   If you can spot catastrophic problems in AI morality you should be able
> to spot the problem in AIXI.  Period.  If you cannot *in advance* see the
> problem as it exists in the formally complete definition of AIXI, then
> there is no reason anyone should believe you if you afterward claim that
> your system won't behave like AIXI due to unspecified future features.


Eliezer,

AIXI and AIXItl are systems that are designed to operate with an initial
fixed goal.  As defined, they don't modify the overall goal they try to
achieve, they just try to achieve this fixed goal as well as possible
through adaptively determining their actions.

Basically, at each time step, AIXI searches through the space of all
programs to find the program that, based on its experience, will best
fulfill its given goal.  It then lets this "best program" run and determine
its next action.  Based on that next action, it has a new program space
search program... etc.

AIXItl does the same thing but it restricts the search to a finite space of
programs, hence it's a computationally possible (but totally impractical)
algorithm.

The harmfulness or benevolence of an AIXI system is therefore closely tied
to the definition of the goal that is given to the system in advance.

It's a very different sort of setup than Novamente, because

1) a Novamente will be allowed to modify its own goals based on its
experience.
2) a Novamente will be capable of spontaneous behavior as well as explicitly
goal-directed behavior

I'm not used to thinking about fixed-goal AGI systems like AIXI,
actually....

The Friendliness and other qualities of such a system seem to me to depend
heavily on the goal chosen.

For instance, what if the system's goal were to prove as many complex
mathematical theorems as possible (given a certain axiomatizaton of math,
and a certain definition of complexity).  Then it would become dangerous in
the long run when it decided to reconfigure all matter in the universe to
increase its brainpower.

So you want "be nice to people and other living things" to be part of its
initial fixed goal.  But this is very hard to formalize in a rigorous
way....  Any formalization one could create, is bound to have some holes in
it....  And the system will have no desire to fix the holes, because its
structure is oriented around achieving its given fixed goal....

A fixed-goal AGI system seems like a bit of a bitch, Friendliness-wise...

What if one supplied AIXI with a goal that explicitly involved modifying its
own goal, though?

So, the initial goal G = "Be nice to people and other living things
according to the formalization F, AND, iteratively reformulate this goal in
a way that pleases the humans you're in contact with, according to the
formalization F1."

It is not clear to me that an AIXI with this kind of
self-modification-oriented goal would be unfriendly to humans.  It might be,
though.  It's not an approach I would trust particularly.

If one gave the AIXItl system the capability to modify the AIXItl algorithm
itself in such a way as to maximize expected goal achievement given its
historical observations, THEN one has a system that really goes beyond
AIXItl, and has a much less predictable behavior.  Hutter's theorems don't
hold anymore, for one thing (though related theorems might).

Anyway, since AIXI is uncomputable and AIXItl is totally infeasible, this is
a purely academic exercise!

-- Ben G








-------
To unsubscribe, change your address, or temporarily deactivate your subscription, 
please go to http://v2.listbox.com/member/?[EMAIL PROTECTED]

RE: [agi] unFriendly AIXI

Reply via email to