I like the attractor approach, I really do! But I think the version you give needs a fundamental clarification.
How about "Don't interfere with the goals of others unless not doing so > basically prevents you fulfilling your goals (explicitly not including low > probability freak events for you pedants out there)" > I think there's something paradoxical about this. My goals are my goals, and if I'm arbitrarily intelligent, those goals will dictate nearly all my actions; only in quite trivial cases will I have the ethical leeway to be nice to others. You exclude "low probability freaks", but don't set a specific line. I think a specific line is possible to set (just pick an arbitrary line, let's say .01 for discussion), BUT I think this fundamentally alters the definition of friendlyness. In other words, I think there are two seperate and very different definitions of friendlyness we could use, and you're trying to take the best properties of both (which is not possible). I think the "attractor" will likely start with one type and eventually transition to the other. But I'm getting ahead of myself. The two types are: 1) Rational-self-interest friendliness. In a group of individuals of approximately equal power, if the goals of the individuals are partially conflicting and partially overlapping, it is rational to cooperate (or cheat without getting caught). It is also rational to try to detect other cheaters and refuse to cooperate with them (despite the fact that you try to cheat others). 2) Friendliness as a modification of goals. Agents voluntarily modify their own goal set to desire friendliness in and of itself. Agents now may cheat if absolutely necessary (by some definition), but will ascribe cheating a negative utility in and of itself (not just as a subgoal). You are mainly concerned with the first sort, but some statements you make only apply to the second sort. For example, if we assign the probability of .01 as a cutoff for our goals, then we no longer have the goal "maximize the probability of event X", but have ACTUALLY ALTERED OUR GOAL to read "When probability of X is below .99, maximize probability of X; when probability of X is above .99, maximize the goals of other Friendlies". On the other hand, it is rational to CLAIM that we changed our goal set without doing so (if we don't think we'll be caught). AGIs have the potential to actually reprogram themselves in this way, and cheack eachother's code to verify the change, whereas human societies can only *attempt* to change people's goals. But, of course, AGIs may try to fool eachother too. I think (2) is the ultimate attractor, because requiring all friendlies to actually change their goal set is the best insurance against cheating. It can be rational to make small concessions in goal-modification if the alternative is to be left out of the friendly society; changing your goal can be within self-interest. Such changes would begin small (say, assign your own goals a weight of .95 and other's goals a weight of .05, or a cutoff as mentioned before) but the attractor would gradually increase the amount of modification (*possibly* within some bounds). The existance of (1) provides a slippery slope into the (2)-attractor, making friendliness fairly likely. But there is a danger if one entity becomes the most powerful (again, as with human societies). This is actually fairly likely if one AI research group achaives AGI first, unless that group purposefully sets up a society of AIs. PS- These views are of coruse wholly my own opinion, as I can't cite any facts to back them up. Also, shouldn't this discussion be on the singularity list? PPS- These arguments *seem* to rely on "goal-stack" AI, since they discuss single goals, but perhaps they generalise. ------------------------------------------- agi Archives: http://www.listbox.com/member/archive/303/=now RSS Feed: http://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: http://www.listbox.com/member/?member_id=8660244&id_secret=95818715-a78a9b Powered by Listbox: http://www.listbox.com
