Hi, > 2) If you get the deep theory wrong, there is a strong possibility of a > silent catastrophic failure: the AI appears to be learning > everything just > fine, and both you and the AI are apparently making all kinds of > fascinating discoveries about AI morality, and everything seems to be > going pretty much like your intuitions predict above, but when the AI > crosses the cognitive threshold of superintelligence it takes actions > which wipe out the human species as a side effect. > > AIXI, which is a completely defined formal system, definitely undergoes a > failure of exactly this type.
*Definitely*, huh? I don't really believe you... I can see the direction your thoughts are going in.... Supppose you're rewarding AIXI for acting as though it's a Friendly AI. Then, by searching the space of all possible programs, it finds some program P that causes it to act as though it's a Friendly AI, satisfying humans thoroughly in this regard. There's an issue that a lot of different programs P could fulfill this criterion. Among these are programs P that will cause AIXI to fool humans into thinking it's Friendly, until such a point as AIXI has acquired enough physical power to annihilate all humans -- and which, at that point, will cause AIXI to annihilate all humans. But I can't see why you think AIXI would be particularly likely to come up with programs P of this nature. Instead, my understanding is that AIXI is going to have a bias to come up with the most compact program P that maximizes reward. And I think it's unlikely that the most compact program P for "impressing humans with Friendliness" is one that involves "acting Friendly for a while, then annihilating humanity." You could argue that the system would maximize its long-term reward by annihilating humanity, because after pesky humans are gone, it can simply reward itself unto eternity without caring what we think. But, if it's powerful enough to annihilate us, it's also probably powerful enough to launch itself into space and reward itself unto eternity without caring what we think, all by itself (an Honest Annie type scenario). Why would it prefer "annihilate humans" P to "launch myself into space" P? But anyway, it seems to me that the way AIXI works is to maximize expected reward assuming that its reward function continues pretty much as it has in the past. So AIXI is not going to choose programs P based on a desire to bring about futures in which it can masturbatively maximize its own rewards. At least, that's my understanding, though I could be wrong. This whole type of scenario is avoided by limitations on computational resources, because I believe that "impressing humans regarding Friendliness by actually being Friendly" is a simpler computational problem than "impressing humans regarding Friendliness by subtly emulating Friendliness but really concealing murderous intentions." Also, I'd note that in a Novamente, one could most likely distinguish these two scenarios by looking inside the system and studying the Atoms and maps therein. Jeez, all this talk about the future of AGI really makes me want to stop e-mailing and dig into the damn codebase and push Novamente a little closer to being a really autonomous intelligence instead of a partially-complete codebase with some narrow-AI applications !!! ;-p -- Ben G ------- To unsubscribe, change your address, or temporarily deactivate your subscription, please go to http://v2.listbox.com/member/?[EMAIL PROTECTED]