Brad Wyble wrote:
>> There are simple external conditions that provoke protective
>> tendencies in humans following chains of logic that seem entirely
>> natural to us. Our intuition that reproducing these simple external
>> conditions serve to provoke protective tendencies in AIs is knowably
>> wrong, failing an unsupported specific complex miracle.
>
> Well said.
>
>> Or to put it another way, you see Friendliness in AIs as pretty
>> likely regardless, and you think I'm going to all these lengths to
>> provide a guarantee. I'm not. I'm going to all these lengths to
>> create a *significant probability* of Friendliness.
>
> You're mischaracterizing my position. I'm certainly not saying we'll
> get friendliness for free, but was trying to reason by analogy (perhaps
> in a flawed way), that our best chance of success may be to model AGI's
> based on our innate tendencies wherever possible. Human behavior is a
> knowable quality.
Okay... what I'm saying, basically, is that to connect AI morality to
human morality turns out to be a very complex problem that is not solved
by saying "let's copy human nature". You need a very specific description
of what you have to copy, how you do the copying, and so on, and this
involves all sorts of complex nonobvious concepts within a complex
nonobvious theory that completely changes the way you see morality. It
would even be fair to say, dismayingly, that in saying "let's build AGI's
which reproduce certain human behaviors", you have not even succeeded in
stating the problem, let alone the solution.
This isn't intended in any personal way, btw. It's just that, like, the
fate of the world *does* actually depend on it and all, so I have to be
very precise about how much progress has occurred at a given point of
theoretical development, rather than offering encouragement.
> I perceived, based on the character of your discussion, that you would
> be unsatisfied with anything short of a formal, mathetmatical proof
> that any given AGI would not destroy us before giving the assent to
> turning it on. If that characterization was incorrect, the fault is
> mine.
No! It's *my* fault! You can't have any! Anyhow, I don't think such a
formal proof is possible. The problem with the proposals I see is not
that they are not *provably* Friendly but that a rational extrapolation of
them shows that they are *unFriendly* barring a miracle. I'll take a
proposal whose rational extrapolation is to Friendliness and which seems
to lie at a local optimum relative to the improvements I can imagine;
proof is impossible.
--
Eliezer S. Yudkowsky http://singinst.org/
Research Fellow, Singularity Institute for Artificial Intelligence
-------
To unsubscribe, change your address, or temporarily deactivate your subscription,
please go to http://v2.listbox.com/member/?[EMAIL PROTECTED]
- Re: [agi] Reply to Bill Hubbard's post: Mon, 10 Feb 2... Brad Wyble
- Re: [agi] Reply to Bill Hubbard's post: Mon, 10 ... Eliezer S. Yudkowsky
- Re: [agi] Reply to Bill Hubbard Eliezer S. Yudkowsky
- Re: [agi] Reply to Bill Hubbard's post: Mon, 10 ... Brad Wyble
- Eliezer S. Yudkowsky
