I like the attractor approach, I really do! But I think the version you give
needs a fundamental clarification.

How about "Don't interfere with the goals of others unless not doing so
> basically prevents you fulfilling your goals (explicitly not including low
> probability freak events for you pedants out there)"
>

I think there's something paradoxical about this. My goals are my goals, and
if I'm arbitrarily intelligent, those goals will dictate nearly all my
actions; only in quite trivial cases will I have the ethical leeway to be
nice to others. You exclude "low probability freaks", but don't set a
specific line. I think a specific line is possible to set (just pick an
arbitrary line, let's say .01 for discussion), BUT I think this
fundamentally alters the definition of friendlyness. In other words, I think
there are two seperate and very different definitions of friendlyness we
could use, and you're trying to take the best properties of both (which is
not possible). I think the "attractor" will likely start with one type and
eventually transition to the other. But I'm getting ahead of myself. The two
types are:

1) Rational-self-interest friendliness. In a group of individuals of
approximately equal power, if the goals of the individuals are partially
conflicting and partially overlapping, it is rational to cooperate (or cheat
without getting caught). It is also rational to try to detect other cheaters
and refuse to cooperate with them (despite the fact that you try to cheat
others).

2) Friendliness as a modification of goals. Agents voluntarily modify their
own goal set to desire friendliness in and of itself. Agents now may cheat
if absolutely necessary (by some definition), but will ascribe cheating a
negative utility in and of itself (not just as a subgoal).

You are mainly concerned with the first sort, but some statements you make
only apply to the second sort. For example, if we assign the probability of
.01 as a cutoff for our goals, then we no longer have the goal "maximize the
probability of event X", but have ACTUALLY ALTERED OUR GOAL to read "When
probability of X is below .99, maximize probability of X; when probability
of X is above .99, maximize the goals of other Friendlies".

On the other hand, it is rational to CLAIM that we changed our goal set
without doing so (if we don't think we'll be caught).

AGIs have the potential to actually reprogram themselves in this way, and
cheack eachother's code to verify the change, whereas human societies can
only *attempt* to change people's goals. But, of course, AGIs may try to
fool eachother too.

I think (2) is the ultimate attractor, because requiring all friendlies to
actually change their goal set is the best insurance against cheating. It
can be rational to make small concessions in goal-modification if the
alternative is to be left out of the friendly society; changing your goal
can be within self-interest. Such changes would begin small (say, assign
your own goals a weight of .95 and other's goals a weight of .05, or a
cutoff as mentioned before) but the attractor would gradually increase the
amount of modification (*possibly* within some bounds). The existance of (1)
provides a slippery slope into the (2)-attractor, making friendliness fairly
likely.

But there is a danger if one entity becomes the most powerful (again, as
with human societies). This is actually fairly likely if one AI research
group achaives AGI first, unless that group purposefully sets up a society
of AIs.



PS- These views are of coruse wholly my own opinion, as I can't cite any
facts to back them up. Also, shouldn't this discussion be on the singularity
list?

PPS- These arguments *seem* to rely on "goal-stack" AI, since they discuss
single goals, but perhaps they generalise.

-------------------------------------------
agi
Archives: http://www.listbox.com/member/archive/303/=now
RSS Feed: http://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: 
http://www.listbox.com/member/?member_id=8660244&id_secret=95818715-a78a9b
Powered by Listbox: http://www.listbox.com

Reply via email to