Brad Wyble wrote:
>> There are simple external conditions that provoke protective
>> tendencies in humans following chains of logic that seem entirely
>> natural to us. Our intuition that reproducing these simple external
>> conditions serve to provoke protective tendencies in AIs is knowably
>> wrong, failing an unsupported specific complex miracle.
>
> Well said.
>
>> Or to put it another way, you see Friendliness in AIs as pretty
>> likely regardless, and you think I'm going to all these lengths to
>> provide a guarantee. I'm not. I'm going to all these lengths to
>> create a *significant probability* of Friendliness.
>
> You're mischaracterizing my position. I'm certainly not saying we'll
> get friendliness for free, but was trying to reason by analogy (perhaps
> in a flawed way), that our best chance of success may be to model AGI's
> based on our innate tendencies wherever possible. Human behavior is a
> knowable quality.

Okay... what I'm saying, basically, is that to connect AI morality to human morality turns out to be a very complex problem that is not solved by saying "let's copy human nature". You need a very specific description of what you have to copy, how you do the copying, and so on, and this involves all sorts of complex nonobvious concepts within a complex nonobvious theory that completely changes the way you see morality. It would even be fair to say, dismayingly, that in saying "let's build AGI's which reproduce certain human behaviors", you have not even succeeded in stating the problem, let alone the solution.

This isn't intended in any personal way, btw. It's just that, like, the fate of the world *does* actually depend on it and all, so I have to be very precise about how much progress has occurred at a given point of theoretical development, rather than offering encouragement.

> I perceived, based on the character of your discussion, that you would
> be unsatisfied with anything short of a formal, mathetmatical proof
> that any given AGI would not destroy us before giving the assent to
> turning it on. If that characterization was incorrect, the fault is
> mine.

No! It's *my* fault! You can't have any! Anyhow, I don't think such a formal proof is possible. The problem with the proposals I see is not that they are not *provably* Friendly but that a rational extrapolation of them shows that they are *unFriendly* barring a miracle. I'll take a proposal whose rational extrapolation is to Friendliness and which seems to lie at a local optimum relative to the improvements I can imagine; proof is impossible.

--
Eliezer S. Yudkowsky http://singinst.org/
Research Fellow, Singularity Institute for Artificial Intelligence

-------
To unsubscribe, change your address, or temporarily deactivate your subscription, please go to http://v2.listbox.com/member/?[EMAIL PROTECTED]

Reply via email to