Re: [agi] Some thoughts of an AGI designer

Mark Waser Mon, 10 Mar 2008 13:15:11 -0700

First off -- yours was a really helpful post.  Thank you!

I think that I need to add a word to my initial assumption . . . .
    Assumption - The AGI will be an optimizing goal-seeking entity.


> There are two main things.
> One is that the statement "The AGI will be a goal-seeking entity" has 
> many different interpretations, ad I am arguing that these different 
> interpretations have a massive impact on what kind of behavior you can 
> expect to see.

I disagree that it has many interpretations.  I am willing to agree that my 
original assumption phrase didn't sufficiently circumscribe the available space 
of entities to justify some of my further reasoning (most particularly because 
Omohundro drives *ASSUME* an optimizing entity -- my bad for not picking that 
up before  :-).

> The 
> MES system, on the other hand, can be set up to have values such as ours 
> and to feel empathy with human beings, and once set up that way you 
> would have to re-grow the system before you could get it to have some 
> other set of values.

As a system that (arguably) finds itself less able to massively (and possibly 
dangerously) optimize itself, the MES system is indeed less subject to my 
reasoning to the extent that it is not able to optimize itself (or, to the 
extent that it is constrained in optimizing itself).  On the other hand, to the 
extent that the MES system *IS* able to optimize itself, I would contend that 
my Omohundro-drive-based reasoning is valid and correct.

> Clearly, these two interpretations of "The AGI will be a goal-seeking 
> entity" have such different properties that, unless there is detailed 
> clarification of what the meaning is, we cannot continue to discuss what 
> they would do.

Hopefully my statement just above will convince you that we can continue since 
we really aren't arguing different properties -- merely the degree to which a 
system can self-optimize.  That should not prevent a useful discussion.

> My second point is that some possible choices of the meaning of "The AGI 
> will be a goal-seeking entity" will actually not cash out into a 
> coherent machine design, so we would be wasting our time if we 
> considered how that kind of AGI would behave.

I disagree.  Even if 50% of the possible choices can't be implemented, then I 
still don't believe that we shouldn't investigate the class as a whole.  It has 
interesting characteristics that lead me to believe that the remaining 50% of 
implementable choices may hit the jackpot.

> In particular, there are severe doubts about whether the Goal-Stack type 
> of system can ever make it up to the level of a full intelligence.  

Ah.  But this is an intelligence argument rather than a Friendliness argument 
and doubly irrelevant because I am not proposing or nor assuming a goal-stack.  
I prefer your system of a large, diffuse set of (often but not always simple) 
goals and constraints and don't believe it to be at all contrary to what I am 
envisioning.  I particularly like it because *I BELIEVE* that such an approach 
is much more likely to produce a safe, orderly/smooth transition into my 
Friendliness attractor that a relatively easily breakable Goal-Stack system.

> I'll go one further on that:  I think that one of the main reasons we have 
> trouble getting AI systems to be AGI is precisely because we have not 
> yet realised that they need to be driven by something more than a Goal 
> Stack.  It is not the only reason, but its a big one.

I agree with you (but it's still not relevant to my argument    :-).

> So the message is:  we need to know exactly details of the AGI's 
> motivation system ("The AGI will be a goal-seeking entity" is not 
> specific enough), and we need to then be sure that the details we give 
> are going to lead to a type of AGI that can actually be an AGI.

No, we don't need to know the details.  I'm contending that my vision/theory 
applies regardless of the details.  If you don't believe so, please supply 
contrary details and I'll do whatever necessary to handle them.    :-)

> These questions, I think, are the real battleground.

We'll see . . . .     :-)

> BTW, this is not a direct attack on what you were saying, 

Actually, I prefer a direct attack    :-).  I should have declared Crocker's 
rules with the "Waste of my time" exception (i.e. I reserve the right to be 
rude to anyone who both is rude *and* wastes my time  :-).

> My problem is that so much of the current discussion is tangled up with 
> hidden assumptions that I think that the interesting part of your 
> message is getting lost.

So let's drag those puppies into the light!  This is not an easy message.  It 
touches on (and, I believe, revises) one helluva lot.  That's why I laugh when 
someone just wants a link to the "completed" paper.  Trust me -- the wording on 
the "completed" paper changes virtually every time there is an e-mail on the 
subject.  And I *don't* want people skipping ahead to the punch line if I'm not 
explaining it well enough at the beginning -- because the whole blasted thing 
needs to be clear and coherent and enjoyable so that other people won't just 
quit in the middle.

> I am in complete disagreement with Omuhundro's idea that there are a 
> canonical set of drives.

Ah!  A simple, clear coherent point of disagreement.  Are you still in complete 
disagreement when the word optimizing is added?  If so, could you please go 
into more details as to why?  If I can't make the case for Omohundro's drives, 
a good portion of my argument/vision *does* collapse so I *do* need to deal 
with this.

> Ditto for its goals and motivations:  what you decide to put into it is 
> what it does, so I cannot make any sense of statements like "I also 
> believe that the AGI will also have dramatically different motivations 
> from humans".  Answer is Yes if you put that kind of weird motivation 
> system into it, and No if you put a human-like motivation system into it.

"What you decide to put into it is what it does".  Well, NO!  Not if I can 
convince you of Omohundro drives . . . .  If I can convince you of the 
universality of certain things, then I can derive a lot more from those things.

Putting a human-like motivation system into an entity that can optimize will 
eventually, I believe, (given enough time and barring stagnation or 
destruction) lead to Friendliness (this is true of humans as well as everything 
else -- and is the point of the Attractor theory).  But Friendliness is 
directly contrary to some human motivations and includes motivations that 
humanity does not.  Thus, future humans are going to have "dramatically 
different motivations from <present-day> humans".  Are you willing to accept 
these statements or can you point out a flaw in my reasoning or explanation 
(please  :-).

> Are you assuming that when an AGI is built, we will have to wait until 
> we switch it on before we have any clue what its motivations will be?

Absolutely not.  I want to switch it on knowing that it *KNOWS* that 
Friendliness is the super-meta-subgoal that will promote all of it's possible 
goals so it will be *EXTREMELY* motivated to be Friendly (and motivated to 
return to Friendliness whenevr it accidentally deviates).

> An MES system does not have a Utility Function. 

An MES system does not have an explicit utility function that it refers to; 
however, there is certainly a (probably complex) utility function that 
accurately "describes" it's behavior (even if it's just the sum of the 
collection of goals and constraints).

> Also, an MES system 
> that was (e.g.) set up to have human-empathy motivations would not be 
> obsessed with the desire to increase its computational machinery.  At 
> least, it would not do so to the exclusion of its other motivations.

I agree completely.  I was very careful to repeatedly use the clause that the 
Omohundro drives have "the exception, of course, of those that directly 
contradict their goals."  If human empathy was your primary goal, you would not 
be obsessed with the desire to increase your computational machinery to the 
detriment of that goal.  However, if you were intelligent and it wouldn't 
directly interfere with your promoting humans, you certainly *WOULD* want to 
increase your computational machinery because doing so would increase your odds 
of success in achieving and sustaining your primary goal of promoting humans.  
As I said above, Omohundro drives are effectively universal super-meta-subgoals.

> So this is my claim, in summary:
> 1)  The statement "Assumption - The AGI will be a goal-seeking entity" 
> is not yet specific enough to yield predictions about how the system 
> will behave, since (at the very least) this statement can be taken to 
> include both the "Goal-Stack" type of drive and the "Motivational 
> Emotional System" type, and these two have wildly different properties.

Agreed.  I added the word optimizing to my base assumption and argued that both 
Goal-Stack and Motivational-Emotional-System AGIs should be handled by my 
arguments despite the fact that they have wildly different properties.

> 2)  If you mean to refer to a simple Goal-Stack system, then my previous 
> critiques apply:  in this case, it is not clear that any AGI built using 
> a GS would be able to function well enough to make it to adulthood.  If 
> my critiques are valid, then we need not consider the behavior of 
> GS-type AGI systems, because there will never be any such systems.

I emphatically do NOT mean to limit my arguments to a simple Goal-Stack system.

> 3)  Any statement that says "An AGI will probably behave like X" is 
> strictly without content unless some mention is made of what motivations 
> or goals were put into the system in the first place  -  and without 
> such a qualifier, the statement is tantamount to speculation about what 
> color it will be without saying what color we chose to paint it.

I started out saying agreed but maybe it would be clearer to get to my point by 
emphatically disagreeing since I just realized that you (and probably me 
previously :-) are conflating an important distinction.

The motivation that is in the system is "I want to achieve *my* goals".

The goals that are in the system I deem to be entirely irrelevant UNLESS they 
are deliberately and directly contrary to Friendliness.  I am contending that, 
unless the initial goals are deliberately and directly contrary to 
Friendliness, an optimizing system's motivation of achieve *my* goals (over a 
large enough set of goals) will eventually cause it to finally converge on the 
goal of Friendliness since Friendliness is the universal super-meta-subgoal of 
all it's other goals (and it's optimizing will also drive it up to the 
necessary intelligence to understand Friendliness).  Of course, it may take a 
while since we humans are still in the middle of it . . . . but hopefully we're 
almost there.    ;-)

> I don't think I have avoided your other questions (both above and 
> below), I am just trying to package my response in this one set of points.

Nope.  You're being *extremely* direct, lucid, and helpful in clarifying my 
thoughts.  Thank you!

        Mark

-------------------------------------------
agi
Archives: http://www.listbox.com/member/archive/303/=now
RSS Feed: http://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: 
http://www.listbox.com/member/?member_id=8660244&id_secret=95818715-a78a9b
Powered by Listbox: http://www.listbox.com

Re: [agi] Some thoughts of an AGI designer

Reply via email to