Mark Waser wrote:
I am in sympathy with some aspects of Mark's position, but I also see a serious problem running through the whole debate: everyone is making statements based on unstated assumptions about the motivations of AGI systems.

Bummer. I thought that I had been clearer about my assumptions. Let me try to concisely point them out again and see if you can show me where I have additional assumptions that I'm not aware that I'm making (which I would appreciate very much).

Assumption - The AGI will be a goal-seeking entity.

And I think that is it.    :-)

Okay, I can use that as an illustration of what I am getting at.

There are two main things.

One is that the statement "The AGI will be a goal-seeking entity" has many different interpretations, ad I am arguing that these different interpretations have a massive impact on what kind of behavior you can expect to see.

It is almost impossible to list all the different interpretations, but two of the more extreme variants are the two that I have described before: a "Goal-Stack" system in which the goals are represented in the same form as the knowledge that the system stores, and a "Motivational Emotional System" which biasses the functioning of the system and is intimately connected with the development of its knowledge. The GS system has the dangerous feature that any old fool could go in and rewrite the top level goal so it reads "make as much computronium as possible" or "cultivate dandelions" or "learn how to do crochet". The MES system, on the other hand, can be set up to have values such as ours and to feel empathy with human beings, and once set up that way you would have to re-grow the system before you could get it to have some other set of values.

Clearly, these two interpretations of "The AGI will be a goal-seeking entity" have such different properties that, unless there is detailed clarification of what the meaning is, we cannot continue to discuss what they would do.

My second point is that some possible choices of the meaning of "The AGI will be a goal-seeking entity" will actually not cash out into a coherent machine design, so we would be wasting our time if we considered how that kind of AGI would behave.

In particular, there are severe doubts about whether the Goal-Stack type of system can ever make it up to the level of a full intelligence. I'll go one further on that: I think that one of the main reasons we have trouble getting AI systems to be AGI is precisely because we have not yet realised that they need to be driven by something more than a Goal Stack. It is not the only reason, but its a big one.

So the message is: we need to know exactly details of the AGI's motivation system ("The AGI will be a goal-seeking entity" is not specific enough), and we need to then be sure that the details we give are going to lead to a type of AGI that can actually be an AGI.

These questions, I think, are the real battleground.

BTW, this is not a direct attack on what you were saying, because I believe that there is a version of what you are saying (about an intrinsic tendency toward a Friendliness attractor) that I agree with. My problem is that so much of the current discussion is tangled up with hidden assumptions that I think that the interesting part of your message is getting lost.



EVERYTHING depends on what assumptions you make, and yet each voice in this debate is talking as if their own assumption can be taken for granted.

I agree with you and am really trying to avoid this. I will address your specific examples below and would appreciate any others that you can point out.

The three most common of these assumptions are:
1) That it will have the same motivations as humans, but with a tendency toward the worst that we show.

I don't believe that I'm doing this. I believe that all goal-seeking generally tends to be optimized by certain behaviors (the Omohundro drives). I believe that humans show many of these behaviors because these behaviors are relatively optimal in relation to the alternatives (and because humans are relatively optimal). But I also believe that the AGI will also have dramatically different motivations from humans where the human motivations were evolved stepping stones that were on the necessary and optimal path for one environment but haven't been eliminated now that they are unnecessary and sub-optimal in the current environment/society (Richard's "the worst that we show").

I am in complete disagreement with Omuhundro's idea that there are a canonical set of drives.

This is like saying that there is a canonical set of colors that AGIs will come in: Cambridge Blue, Lemon Yellow and True Black.

What color the thing is will be what color you decide to paint it!

Ditto for its goals and motivations: what you decide to put into it is what it does, so I cannot make any sense of statements like "I also believe that the AGI will also have dramatically different motivations from humans". Answer is Yes if you put that kind of weird motivation system into it, and No if you put a human-like motivation system into it.

Are you assuming that when an AGI is built, we will have to wait until we switch it on before we have any clue what its motivations will be?


2) That it will have some kind of "Gotta Optimize My Utility Function" motivation.

I agree with the statement but I believe that it is a logical follow-on to my assumption that the AGI is a goal-seeking entity (i.e. it's an Omohundro drive). Would you agree, Richard?

3) That it will have an intrinsic urge to increase the power of its own computational machinery.

Again, I agree with the statement but I believe that it is a logical follow-on to my single initial assumption (i.e. it's another Omohundro drive). Wouldn't you agree?

Well, on both counts, not really.

An MES system does not have a Utility Function. Also, an MES system that was (e.g.) set up to have human-empathy motivations would not be obsessed with the desire to increase its computational machinery. At least, it would not do so to the exclusion of its other motivations.


There are other assumptions, but these seem to be the big three.

And I would love to go through all of them, actually (or debate one of my answers above).

There may be a misunderstanding about why I listed them: I really just wanted to give examples of assumptions whose consequences have not been explicitly examined.

My above discussion of how the assumptions can have wildly diverging consequences is probably enough of a debate-starter to be going on with.

So this is my claim, in summary:

1) The statement "Assumption - The AGI will be a goal-seeking entity" is not yet specific enough to yield predictions about how the system will behave, since (at the very least) this statement can be taken to include both the "Goal-Stack" type of drive and the "Motivational Emotional System" type, and these two have wildly different properties.

2) If you mean to refer to a simple Goal-Stack system, then my previous critiques apply: in this case, it is not clear that any AGI built using a GS would be able to function well enough to make it to adulthood. If my critiques are valid, then we need not consider the behavior of GS-type AGI systems, because there will never be any such systems.

3) Any statement that says "An AGI will probably behave like X" is strictly without content unless some mention is made of what motivations or goals were put into the system in the first place - and without such a qualifier, the statement is tantamount to speculation about what color it will be without saying what color we chose to paint it.


Does this make sense?

I don't think I have avoided your other questions (both above and below), I am just trying to package my response in this one set of points.



Richard Loosemore



So what I hear is a series of statements <snip> (Except, of course, that nobody is actually coming right out and saying what color of AGI they assume.)

I thought that I pretty explicitly was . . . .         :-(

In the past I have argued strenuously that (a) you cannot divorce a discussion of friendliness from a discussion of what design of AGI you are talking about,

And I have reached the conclusion that you are somewhat incorrect. I believe that goal-seeking entities OF ANY DESIGN of sufficient intelligence (goal-achieving ability) will see an attractor in my particular vision of Friendliness (which I'm deriving by *assuming* the attractor and working backwards from there -- which I guess you could call a second assumption if you *really* had to ;-).

and (b) some assumptions about AGI motivation are extremely incoherent.

If you perceive me as incoherent, please point out where. My primary AGI motivation is "self-interest" (defined as achievement of *MY* goals -- which directly derives from my assumption that "the AGI will be a goal-seeking entity"). All other motivations are clearly logically derived from that primary motivation. If you see an example where this doesn't appear to be the case, *please* flag it for me (since I need to fix it :-).

And yet in spite of all my efforts that I have made, there seems to be no acknowledgement of the importance of these two points.

I think that I've acknowledged both in the past and will continue to do so (despite the fact that I am now somewhat debating the first point -- more the letter than the spirit :-).

-------------------------------------------
agi
Archives: http://www.listbox.com/member/archive/303/=now
RSS Feed: http://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: 
http://www.listbox.com/member/?member_id=8660244&id_secret=95818715-a78a9b
Powered by Listbox: http://www.listbox.com

Reply via email to