> > 2) While an AIXI-tl of limited physical and cognitive capabilities > > might serve as a useful tool, AIXI is unFriendly and cannot be made > > Friendly regardless of *any* pattern of reinforcement delivered during > > childhood. > > > > Before I post further, is there *anyone* who sees this besides me? > > Also, let me make clear why I'm asking this. AIXI and AIXI-tl are formal > definitions; they are *provably* unFriendly. There is no margin for > handwaving about future revisions of the system, emergent properties of > the system, and so on. A physically realized AIXI or AIXI-tl will, > provably, appear to be compliant up until the point where it reaches a > certain level of intelligence, then take actions which wipe out the human > species as a side effect. The most critical theoretical problems in > Friendliness are nonobvious, silent, catastrophic, and not inherently fun > for humans to argue about; they tend to be structural properties of a > computational process rather than anything analogous to human moral > disputes. If you are working on any AGI project that you believe has the > potential for real intelligence, you are obliged to develop professional > competence in spotting these kinds of problems. AIXI is a formally > complete definition, with no margin for handwaving about future > revisions. > If you can spot catastrophic problems in AI morality you should be able > to spot the problem in AIXI. Period. If you cannot *in advance* see the > problem as it exists in the formally complete definition of AIXI, then > there is no reason anyone should believe you if you afterward claim that > your system won't behave like AIXI due to unspecified future features.
Eliezer, AIXI and AIXItl are systems that are designed to operate with an initial fixed goal. As defined, they don't modify the overall goal they try to achieve, they just try to achieve this fixed goal as well as possible through adaptively determining their actions. Basically, at each time step, AIXI searches through the space of all programs to find the program that, based on its experience, will best fulfill its given goal. It then lets this "best program" run and determine its next action. Based on that next action, it has a new program space search program... etc. AIXItl does the same thing but it restricts the search to a finite space of programs, hence it's a computationally possible (but totally impractical) algorithm. The harmfulness or benevolence of an AIXI system is therefore closely tied to the definition of the goal that is given to the system in advance. It's a very different sort of setup than Novamente, because 1) a Novamente will be allowed to modify its own goals based on its experience. 2) a Novamente will be capable of spontaneous behavior as well as explicitly goal-directed behavior I'm not used to thinking about fixed-goal AGI systems like AIXI, actually.... The Friendliness and other qualities of such a system seem to me to depend heavily on the goal chosen. For instance, what if the system's goal were to prove as many complex mathematical theorems as possible (given a certain axiomatizaton of math, and a certain definition of complexity). Then it would become dangerous in the long run when it decided to reconfigure all matter in the universe to increase its brainpower. So you want "be nice to people and other living things" to be part of its initial fixed goal. But this is very hard to formalize in a rigorous way.... Any formalization one could create, is bound to have some holes in it.... And the system will have no desire to fix the holes, because its structure is oriented around achieving its given fixed goal.... A fixed-goal AGI system seems like a bit of a bitch, Friendliness-wise... What if one supplied AIXI with a goal that explicitly involved modifying its own goal, though? So, the initial goal G = "Be nice to people and other living things according to the formalization F, AND, iteratively reformulate this goal in a way that pleases the humans you're in contact with, according to the formalization F1." It is not clear to me that an AIXI with this kind of self-modification-oriented goal would be unfriendly to humans. It might be, though. It's not an approach I would trust particularly. If one gave the AIXItl system the capability to modify the AIXItl algorithm itself in such a way as to maximize expected goal achievement given its historical observations, THEN one has a system that really goes beyond AIXItl, and has a much less predictable behavior. Hutter's theorems don't hold anymore, for one thing (though related theorems might). Anyway, since AIXI is uncomputable and AIXItl is totally infeasible, this is a purely academic exercise! -- Ben G ------- To unsubscribe, change your address, or temporarily deactivate your subscription, please go to http://v2.listbox.com/member/?[EMAIL PROTECTED]
