Re: [agi] unFriendly AIXI

Eliezer S. Yudkowsky Tue, 11 Feb 2003 15:16:12 -0800

Ben Goertzel wrote:

Huh.  We may not be on the same page.  Using:
http://www.idsia.ch/~marcus/ai/aixigentle.pdf


Page 5:

"The general framework for AI might be viewed as the design and study of
intelligent agents [RN95]. An agent is a cybernetic system with some
internal state, which acts with output yk on some environment in cycle k,
perceives some input xk from the environment and updates its internal
state. Then the next cycle follows. We split the input xk into a regular
part x0k and a reward rk, often called reinforcement feedback. From time
to time the environment provides non-zero reward to the agent.
The task of
the agent is to maximize its utility, defined as the sum of
future rewards."

I didn't see any reward function V defined for AIXI in any of the Hutter
papers I read, nor is it at all clear how such a V could be
defined, given
that the internal representation of "reality" produced by Solomonoff
induction is not fixed enough for any reward function to operate on it in
the same way that, e.g., our emotions bind to our own standardized
cognitive representations.

Quite literally, we are not on the same page ;)

Thought so...

Look at page 23, Definition 10 of the "intelligence ordering relation"
(which says what it means for one system to be more intelligent than
another).  And look at the start of Section 4.1, which Definition 10 lives
within.

The reward function V is defined there, basically as cumulative reward over
a period of time.  It's used all thru Section 4.1, and following that, it's
used mostly implicitly inside the intelligence ordering relation.

The reward function V however is *not* part of AIXI's structure; it is rather a test *applied to* AIXI from outside as part of Hutter's optimality proof. AIXI itself is not given V; it induces V via Solomonoff induction on past rewards. V can be at least as flexible as any criterion a (computable) human uses to determine when and how hard to press the reward button, nor is AIXI's approximation of V fixed at the start. Given this, would you regard AIXI as formally approximating the kind of goal learning that Novamente is supposed to do?

As Definition 10 makes clear, intelligence is defined relative to a fixed
reward function.

A fixed reward function *outside* AIXI, so that the intelligence of AIXI can be defined relative to it... or am I wrong?

> What the theorems about AIXItl state is that, given a

fixed reward function, the AIXItl can do as well as any other algorithm at
achieving this reward function, if you give it computational resources equal
to those that the other algorithm got, plus a constant.  But the constant is
fucking HUGE.

Actually, I think AIXItl is supposed to do as well as a tl-bounded algorithm given t2^l resources... though again perhaps I am wrong.

Whether you specify the fixed reward function in its cumulative version or
not doesn't really matter...

Actually, AIXI's fixed horizon looks to me like it could give rise to some strange behaviors, but I think Hutter's already aware that this is probably AIXI's weakest link.

--
Eliezer S. Yudkowsky http://singinst.org/
Research Fellow, Singularity Institute for Artificial Intelligence

-------
To unsubscribe, change your address, or temporarily deactivate your subscription, please go to http://v2.listbox.com/member/?[EMAIL PROTECTED]

Re: [agi] unFriendly AIXI

Reply via email to