On 03/05/2008 12:36 PM,, Mark Waser wrote:
snip...

The obvious initial starting point is to explicitly recognize that the point of Friendliness is that we wish to prevent the extinction of the *human race* and/or to prevent many other horrible nasty things that would make *us* unhappy. After all, this is why we believe Friendliness is so important. Unfortunately, the problem with this starting point is that it biases the search for Friendliness in a direction towards a specific type of Unfriendliness. In particular, in a later e-mail, I will show that several prominent features of Eliezer Yudkowski's vision of Friendliness are actually distinctly Unfriendly and will directly lead to a system/situation that is less safe for humans.

One of the critically important advantages of my proposed definition/vision of Friendliness is that it is an attractor in state space. If a system finds itself outside (but necessarily somewhat/reasonably close) to an optimally Friendly state -- it will actually DESIRE to reach or return to that state (and yes, I *know* that I'm going to have to prove that contention). While Eli's vision of Friendliness is certainly stable (i.e. the system won't intentionally become unfriendly), there is no "force" or desire helping it to return to Friendliness if it deviates somehow due to an error or outside influence. I believe that this is a *serious* shortcoming in his vision of the extrapolation of the collective volition (and yes, this does mean that I believe both that Friendliness is CEV and that I, personally, (and shortly, we collectively) can define a stable path to an attractor CEV that is provably sufficient and arguably optimal and which should hold up under all future evolution.

TAKE-AWAY:  Friendliness is (and needs to be) an attractor CEV

PART 2 will describe how to create an attractor CEV and make it more obvious why you want such a thing.


!! Let the flames begin !!            :-)

1. How will the AI determine what is in the set of "horrible nasty thing[s] that would make *us* unhappy"? I guess this is related to how you will define the attractor precisely.

2. Preventing the extinction of the human race is pretty clear today, but *human race* will become increasingly fuzzy and hard to define, as will *extinction* when there are more options for existence than existence as meat. In the long term, how will the AI decide who is "*us*" in the above quote?

Thanks,

jk

-------------------------------------------
agi
Archives: http://www.listbox.com/member/archive/303/=now
RSS Feed: http://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: 
http://www.listbox.com/member/?member_id=8660244&id_secret=95818715-a78a9b
Powered by Listbox: http://www.listbox.com

Reply via email to