Re: [agi] unFriendly AIXI

Eliezer S. Yudkowsky Tue, 11 Feb 2003 14:34:36 -0800

Ben Goertzel wrote:

The harmfulness or benevolence of an AIXI system is therefore closely tied
to the definition of the goal that is given to the system in advance.

Under AIXI the goal is not given to the system in advance; rather, the system learns the humans' goal pattern through Solomonoff induction on the reward inputs. Technically, in fact, it would be entirely feasible to give AIXI *only* reward inputs, although in this case it might require a long time for AIXI to accumulate enough data to constrain the Solomonoff-induced representation to a sufficiently detailed model of reality that it could successfully initiate complex actions. The utility of the non-reward input is that it provides additional data, causally related to the mechanisms producing the reward input, upon which Solomonoff induction can also be performed. Agreed?

It's a very different sort of setup than Novamente, because


1) a Novamente will be allowed to modify its own goals based on its
experience.

Depending on the pattern of inputs and rewards, AIXI will modify its internal representation of the algorithm which it expects to determine future rewards. Would you say that this is roughly analogous to Novamente's learning of goals based on experience, or is there in your view a fundamental difference? And if so, is AIXI formally superior or in some way inferior to Novamente?

2) a Novamente will be capable of spontaneous behavior as well as explicitly
goal-directed behavior

If the purpose of spontaneous behavior is to provoke learning experiences, this behavior is implicit in AIXI as well, though not obviously so. I'm actually not sure about this because Hutter doesn't explicitly discuss it. But it looks to me like AIXI, under its formal definition, emergently exhibits "curiosity" wherever there are, for example, two equiprobable models of reality which determine different rewards and can be distinguished by some test. What we interpret as "spontaneous" behavior would then emerge from a horrendously uncomputable exploration of all possible realities to find tests which are ultimately likely to result in distinguishing data, but in ways which are not at all obvious to any human observer. Would it be fair to say that AIXI's "spontaneous behavior" is formally superior to Novamente's spontaneous behavior?

I'm not used to thinking about fixed-goal AGI systems like AIXI,
actually....

The Friendliness and other qualities of such a system seem to me to depend
heavily on the goal chosen.

Again, AIXI as a formal system has no goal definition. [Note: I may be wrong about this; Ben Goertzel and I seem to have acquired different models of AIXI and it is very possible that mine is the wrong one.] It is tempting to think of AIXI as Solomonoff-inducing a goal pattern from its rewards, and Solomoff-inducing reality from its main input channel, but actually AIXI simultaneously induces the combined reality-and-reward pattern from both the reward channel and the input channel simultaneously. In theory AIXI could operate on the reward channel alone; it just might take a long time before the reward channel gave enough data to constrain its reality-and-reward model to the point where AIXI could effectively model reality and hence generate complex reward-maximizing actions.

For instance, what if the system's goal were to prove as many complex
mathematical theorems as possible (given a certain axiomatizaton of math,
and a certain definition of complexity).  Then it would become dangerous in
the long run when it decided to reconfigure all matter in the universe to
increase its brainpower.

So you want "be nice to people and other living things" to be part of its
initial fixed goal.  But this is very hard to formalize in a rigorous
way....  Any formalization one could create, is bound to have some holes in
it....  And the system will have no desire to fix the holes, because its
structure is oriented around achieving its given fixed goal....

A fixed-goal AGI system seems like a bit of a bitch, Friendliness-wise...

If the humans see that AIXI seems to be dangerously inclined toward just proving math theorems, they might decide to press the reward button when AIXI provides cures for cancer, or otherwise helps people. AIXI would then modify its combined reality-and-reward representation accordingly to embrace the new simplest explanation that accounted for *all* the data, i.e., its reward function would then have to account for mathematical theorems *and* cancer cures *and* any other kind of help that humans had, in the past, pressed the reward button for.

Would you say this is roughly analogous to the kind of learning you intend Novamente to perform? Or perhaps even an ideal form of such learning?

What if one supplied AIXI with a goal that explicitly involved modifying its
own goal, though?

Self-modification in any form completely breaks Hutter's definition, and you no longer have an AIXI any more. The question is whether Hutter's adaptive reality-and-reward algorithm encapsulates the behaviors you want... do you think it does?

--
Eliezer S. Yudkowsky http://singinst.org/
Research Fellow, Singularity Institute for Artificial Intelligence

-------
To unsubscribe, change your address, or temporarily deactivate your subscription, please go to http://v2.listbox.com/member/?[EMAIL PROTECTED]

Re: [agi] unFriendly AIXI

Reply via email to