On 03/05/2008 12:36 PM,, Mark Waser wrote:
snip...
The obvious initial starting point is to explicitly recognize that the
point of Friendliness is that we wish to prevent the extinction of the
*human race* and/or to prevent many other horrible nasty things that
would make *us* unhappy. After all, this is why we believe
Friendliness is so important. Unfortunately, the problem with this
starting point is that it biases the search for Friendliness in a
direction towards a specific type of Unfriendliness. In particular,
in a later e-mail, I will show that several prominent features of
Eliezer Yudkowski's vision of Friendliness are actually distinctly
Unfriendly and will directly lead to a system/situation that is less
safe for humans.
One of the critically important advantages of my proposed
definition/vision of Friendliness is that it is an attractor in state
space. If a system finds itself outside (but necessarily
somewhat/reasonably close) to an optimally Friendly state -- it will
actually DESIRE to reach or return to that state (and yes, I *know*
that I'm going to have to prove that contention). While Eli's vision
of Friendliness is certainly stable (i.e. the system won't
intentionally become unfriendly), there is no "force" or desire
helping it to return to Friendliness if it deviates somehow due to an
error or outside influence. I believe that this is a *serious*
shortcoming in his vision of the extrapolation of the collective
volition (and yes, this does mean that I believe both that
Friendliness is CEV and that I, personally, (and shortly, we
collectively) can define a stable path to an attractor CEV that is
provably sufficient and arguably optimal and which should hold up
under all future evolution.
TAKE-AWAY: Friendliness is (and needs to be) an attractor CEV
PART 2 will describe how to create an attractor CEV and make it more
obvious why you want such a thing.
!! Let the flames begin !! :-)
1. How will the AI determine what is in the set of "horrible nasty
thing[s] that would make *us* unhappy"? I guess this is related to how
you will define the attractor precisely.
2. Preventing the extinction of the human race is pretty clear today,
but *human race* will become increasingly fuzzy and hard to define, as
will *extinction* when there are more options for existence than
existence as meat. In the long term, how will the AI decide who is
"*us*" in the above quote?
Thanks,
jk
-------------------------------------------
agi
Archives: http://www.listbox.com/member/archive/303/=now
RSS Feed: http://www.listbox.com/member/archive/rss/303/
Modify Your Subscription:
http://www.listbox.com/member/?member_id=8660244&id_secret=95818715-a78a9b
Powered by Listbox: http://www.listbox.com