Ben, you and I have a long-standing disagreement on a certain issue which
impacts the survival of all life on Earth. I know you're probably bored
with it by now, but I hope you can understand why, given my views, I keep
returning to it, and find a little tolerance for my doing so.
The issue is our two differing views on the difficulty of AI morality.
Your intuitions say... I am trying to summarize my impression of your
viewpoint, please feel free to correct me... "AI morality is a matter of
experiential learning, not just for the AI, but for the programmers. To
teach an AI morality you must give it the right feedback on moral
questions and reinforce the right behaviors... and you must also learn
*about* the deep issues of AI morality by raising a young AI. It isn't
pragmatically realistic to work out elaborate theories of AI morality in
advance; you must learn what you need to know as you go along. Moreover,
learning what you need to know, as you go along, is a good strategy for
creating a superintelligence... or at least, the rational estimate of the
goodness of that strategy is sufficient to make it a good idea to try and
create a superintelligence, and there aren't any realistic strategies that
are better. An informal, intuitive theory of AI morality is good enough
to spark experiential learning in the *programmer* that carries you all
the way to the finish line. You'll learn what you need to know as you go
along. The most fundamental theoretical and design challenge is making AI
happen, at all; that's the really difficult part that's defeated everyone
else so far. Focus on making AI happen. If you can make AI happen,
you'll learn how to create moral AI from the experience."
In contrast, I felt that it was a good idea to develop a theory of AI
morality in advance, and have developed this theory to the point where it
currently predicts, counter to my initial intuitions and to my
considerable dismay:
1) AI morality is an extremely deep and nonobvious challenge which has no
significant probability of going right by accident.
2) If you get the deep theory wrong, there is a strong possibility of a
silent catastrophic failure: the AI appears to be learning everything just
fine, and both you and the AI are apparently making all kinds of
fascinating discoveries about AI morality, and everything seems to be
going pretty much like your intuitions predict above, but when the AI
crosses the cognitive threshold of superintelligence it takes actions
which wipe out the human species as a side effect.
AIXI, which is a completely defined formal system, definitely undergoes a
failure of exactly this type.
Ben, you need to be able to spot this. Think of it as a practice run for
building a real transhuman AI. If you can't spot the critical structural
property of AIXI's foundations that causes AIXI to undergo silent
catastrophic failure, then a real-world reprise of that situation with
Novamente would mean you don't have the deep theory to choose good
foundations deliberately, you can't spot bad foundations deductively, and
because the problems only show up when the AI reaches superintelligence,
you won't get experiential feedback on the failure of your theory until
it's too late. Exploratory research on AI morality doesn't work for AIXI
- it doesn't even visibly fail. It *appears* to work until it's too late.
If you don't spot the problem in advance, you lose.
If I can demonstrate that your current strategy for AI development would
undergo silent catastrophic failure in AIXI - that your stated strategy,
practiced on AIXI, would wipe out the human species, and you didn't spot
it - will you acknowledge that as a "practice loss"? A practice loss
isn't the end of the world. I have one practice loss on my record too.
But when that happened I took it seriously; I changed my behavior as a
result. If you can't spot the silent failure in AIXI, would you then
*please* admit that your current strategy on AI morality development is
not adequate for building a transhuman AI? You don't have to halt work on
Novamente, just accept that you're not ready to try and create a
transhuman AI *yet*.
I can spot the problem in AIXI because I have practice looking for silent
failures, because I have an underlying theory that makes it immediately
obvious which useful properties are formally missing from AIXI, and
because I have a specific fleshed-out idea for how to create moral systems
and I can see AIXI doesn't work that way. Is it really all that
implausible that you'd need to reach that point before being able to
create a transhuman Novamente? Is it really so implausible that AI
morality is difficult enough to require at least one completely dedicated
specialist?
--
Eliezer S. Yudkowsky http://singinst.org/
Research Fellow, Singularity Institute for Artificial Intelligence
-------
To unsubscribe, change your address, or temporarily deactivate your subscription,
please go to http://v2.listbox.com/member/?[EMAIL PROTECTED]
- Re: [agi] unFriendly AIXI... and Novamente? Eliezer S. Yudkowsky
- Re: [agi] unFriendly AIXI... and Novamente? Shane Legg
- Re: [agi] unFriendly AIXI... and Novamente? Philip Sutton
- RE: [agi] unFriendly AIXI... and Novamente? Ben Goertzel
- Re: [agi] unFriendly AIXI... and Novamente? Eliezer S. Yudkowsky
- Re: [agi] unFriendly AIXI... and Novamen... Alan Grimes
- Re: [agi] unFriendly AIXI... and Nov... Jonathan Standley
- Re: [agi] unFriendly AIXI... an... Alan Grimes
- [agi] Reinforcement learning Ben Goertzel
- Re: [agi] Reinforcement lea... Stephen Reed
- Re: [agi] Reinforcement... Brad Wyble