Ben Goertzel wrote:
Huh. We may not be on the same page. Using: http://www.idsia.ch/~marcus/ai/aixigentle.pdfPage 5: "The general framework for AI might be viewed as the design and study of intelligent agents [RN95]. An agent is a cybernetic system with some internal state, which acts with output yk on some environment in cycle k, perceives some input xk from the environment and updates its internal state. Then the next cycle follows. We split the input xk into a regular part x0k and a reward rk, often called reinforcement feedback. From time to time the environment provides non-zero reward to the agent. The task of the agent is to maximize its utility, defined as the sum of future rewards." I didn't see any reward function V defined for AIXI in any of the Hutter papers I read, nor is it at all clear how such a V could be defined, given that the internal representation of "reality" produced by Solomonoff induction is not fixed enough for any reward function to operate on it in the same way that, e.g., our emotions bind to our own standardized cognitive representations.Quite literally, we are not on the same page ;)
Thought so...
The reward function V however is *not* part of AIXI's structure; it is rather a test *applied to* AIXI from outside as part of Hutter's optimality proof. AIXI itself is not given V; it induces V via Solomonoff induction on past rewards. V can be at least as flexible as any criterion a (computable) human uses to determine when and how hard to press the reward button, nor is AIXI's approximation of V fixed at the start. Given this, would you regard AIXI as formally approximating the kind of goal learning that Novamente is supposed to do?Look at page 23, Definition 10 of the "intelligence ordering relation" (which says what it means for one system to be more intelligent than another). And look at the start of Section 4.1, which Definition 10 lives within. The reward function V is defined there, basically as cumulative reward over a period of time. It's used all thru Section 4.1, and following that, it's used mostly implicitly inside the intelligence ordering relation.
A fixed reward function *outside* AIXI, so that the intelligence of AIXI can be defined relative to it... or am I wrong?As Definition 10 makes clear, intelligence is defined relative to a fixed reward function.
> What the theorems about AIXItl state is that, given a
Actually, I think AIXItl is supposed to do as well as a tl-bounded algorithm given t2^l resources... though again perhaps I am wrong.fixed reward function, the AIXItl can do as well as any other algorithm at achieving this reward function, if you give it computational resources equal to those that the other algorithm got, plus a constant. But the constant is fucking HUGE.
Actually, AIXI's fixed horizon looks to me like it could give rise to some strange behaviors, but I think Hutter's already aware that this is probably AIXI's weakest link.Whether you specify the fixed reward function in its cumulative version or not doesn't really matter...
--
Eliezer S. Yudkowsky http://singinst.org/
Research Fellow, Singularity Institute for Artificial Intelligence
-------
To unsubscribe, change your address, or temporarily deactivate your subscription, please go to http://v2.listbox.com/member/?[EMAIL PROTECTED]
