[agi] unFriendly AIXI... and Novamente?

Eliezer S. Yudkowsky Wed, 12 Feb 2003 00:01:59 -0800

Ben, you and I have a long-standing disagreement on a certain issue which impacts the survival of all life on Earth. I know you're probably bored with it by now, but I hope you can understand why, given my views, I keep returning to it, and find a little tolerance for my doing so.

The issue is our two differing views on the difficulty of AI morality.

Your intuitions say... I am trying to summarize my impression of your viewpoint, please feel free to correct me... "AI morality is a matter of experiential learning, not just for the AI, but for the programmers. To teach an AI morality you must give it the right feedback on moral questions and reinforce the right behaviors... and you must also learn *about* the deep issues of AI morality by raising a young AI. It isn't pragmatically realistic to work out elaborate theories of AI morality in advance; you must learn what you need to know as you go along. Moreover, learning what you need to know, as you go along, is a good strategy for creating a superintelligence... or at least, the rational estimate of the goodness of that strategy is sufficient to make it a good idea to try and create a superintelligence, and there aren't any realistic strategies that are better. An informal, intuitive theory of AI morality is good enough to spark experiential learning in the *programmer* that carries you all the way to the finish line. You'll learn what you need to know as you go along. The most fundamental theoretical and design challenge is making AI happen, at all; that's the really difficult part that's defeated everyone else so far. Focus on making AI happen. If you can make AI happen, you'll learn how to create moral AI from the experience."

In contrast, I felt that it was a good idea to develop a theory of AI morality in advance, and have developed this theory to the point where it currently predicts, counter to my initial intuitions and to my considerable dismay:

1) AI morality is an extremely deep and nonobvious challenge which has no significant probability of going right by accident.

2) If you get the deep theory wrong, there is a strong possibility of a silent catastrophic failure: the AI appears to be learning everything just fine, and both you and the AI are apparently making all kinds of fascinating discoveries about AI morality, and everything seems to be going pretty much like your intuitions predict above, but when the AI crosses the cognitive threshold of superintelligence it takes actions which wipe out the human species as a side effect.

AIXI, which is a completely defined formal system, definitely undergoes a failure of exactly this type.

Ben, you need to be able to spot this. Think of it as a practice run for building a real transhuman AI. If you can't spot the critical structural property of AIXI's foundations that causes AIXI to undergo silent catastrophic failure, then a real-world reprise of that situation with Novamente would mean you don't have the deep theory to choose good foundations deliberately, you can't spot bad foundations deductively, and because the problems only show up when the AI reaches superintelligence, you won't get experiential feedback on the failure of your theory until it's too late. Exploratory research on AI morality doesn't work for AIXI - it doesn't even visibly fail. It *appears* to work until it's too late. If you don't spot the problem in advance, you lose.

If I can demonstrate that your current strategy for AI development would undergo silent catastrophic failure in AIXI - that your stated strategy, practiced on AIXI, would wipe out the human species, and you didn't spot it - will you acknowledge that as a "practice loss"? A practice loss isn't the end of the world. I have one practice loss on my record too. But when that happened I took it seriously; I changed my behavior as a result. If you can't spot the silent failure in AIXI, would you then *please* admit that your current strategy on AI morality development is not adequate for building a transhuman AI? You don't have to halt work on Novamente, just accept that you're not ready to try and create a transhuman AI *yet*.

I can spot the problem in AIXI because I have practice looking for silent failures, because I have an underlying theory that makes it immediately obvious which useful properties are formally missing from AIXI, and because I have a specific fleshed-out idea for how to create moral systems and I can see AIXI doesn't work that way. Is it really all that implausible that you'd need to reach that point before being able to create a transhuman Novamente? Is it really so implausible that AI morality is difficult enough to require at least one completely dedicated specialist?

--
Eliezer S. Yudkowsky http://singinst.org/
Research Fellow, Singularity Institute for Artificial Intelligence

-------
To unsubscribe, change your address, or temporarily deactivate your subscription, please go to http://v2.listbox.com/member/?[EMAIL PROTECTED]

[agi] unFriendly AIXI... and Novamente?

Reply via email to