First off -- yours was a really helpful post. Thank you!
I think that I need to add a word to my initial assumption . . . .
Assumption - The AGI will be an optimizing goal-seeking entity.
> There are two main things.
> One is that the statement "The AGI will be a goal-seeking entity" has
> many different interpretations, ad I am arguing that these different
> interpretations have a massive impact on what kind of behavior you can
> expect to see.
I disagree that it has many interpretations. I am willing to agree that my
original assumption phrase didn't sufficiently circumscribe the available space
of entities to justify some of my further reasoning (most particularly because
Omohundro drives *ASSUME* an optimizing entity -- my bad for not picking that
up before :-).
> The
> MES system, on the other hand, can be set up to have values such as ours
> and to feel empathy with human beings, and once set up that way you
> would have to re-grow the system before you could get it to have some
> other set of values.
As a system that (arguably) finds itself less able to massively (and possibly
dangerously) optimize itself, the MES system is indeed less subject to my
reasoning to the extent that it is not able to optimize itself (or, to the
extent that it is constrained in optimizing itself). On the other hand, to the
extent that the MES system *IS* able to optimize itself, I would contend that
my Omohundro-drive-based reasoning is valid and correct.
> Clearly, these two interpretations of "The AGI will be a goal-seeking
> entity" have such different properties that, unless there is detailed
> clarification of what the meaning is, we cannot continue to discuss what
> they would do.
Hopefully my statement just above will convince you that we can continue since
we really aren't arguing different properties -- merely the degree to which a
system can self-optimize. That should not prevent a useful discussion.
> My second point is that some possible choices of the meaning of "The AGI
> will be a goal-seeking entity" will actually not cash out into a
> coherent machine design, so we would be wasting our time if we
> considered how that kind of AGI would behave.
I disagree. Even if 50% of the possible choices can't be implemented, then I
still don't believe that we shouldn't investigate the class as a whole. It has
interesting characteristics that lead me to believe that the remaining 50% of
implementable choices may hit the jackpot.
> In particular, there are severe doubts about whether the Goal-Stack type
> of system can ever make it up to the level of a full intelligence.
Ah. But this is an intelligence argument rather than a Friendliness argument
and doubly irrelevant because I am not proposing or nor assuming a goal-stack.
I prefer your system of a large, diffuse set of (often but not always simple)
goals and constraints and don't believe it to be at all contrary to what I am
envisioning. I particularly like it because *I BELIEVE* that such an approach
is much more likely to produce a safe, orderly/smooth transition into my
Friendliness attractor that a relatively easily breakable Goal-Stack system.
> I'll go one further on that: I think that one of the main reasons we have
> trouble getting AI systems to be AGI is precisely because we have not
> yet realised that they need to be driven by something more than a Goal
> Stack. It is not the only reason, but its a big one.
I agree with you (but it's still not relevant to my argument :-).
> So the message is: we need to know exactly details of the AGI's
> motivation system ("The AGI will be a goal-seeking entity" is not
> specific enough), and we need to then be sure that the details we give
> are going to lead to a type of AGI that can actually be an AGI.
No, we don't need to know the details. I'm contending that my vision/theory
applies regardless of the details. If you don't believe so, please supply
contrary details and I'll do whatever necessary to handle them. :-)
> These questions, I think, are the real battleground.
We'll see . . . . :-)
> BTW, this is not a direct attack on what you were saying,
Actually, I prefer a direct attack :-). I should have declared Crocker's
rules with the "Waste of my time" exception (i.e. I reserve the right to be
rude to anyone who both is rude *and* wastes my time :-).
> My problem is that so much of the current discussion is tangled up with
> hidden assumptions that I think that the interesting part of your
> message is getting lost.
So let's drag those puppies into the light! This is not an easy message. It
touches on (and, I believe, revises) one helluva lot. That's why I laugh when
someone just wants a link to the "completed" paper. Trust me -- the wording on
the "completed" paper changes virtually every time there is an e-mail on the
subject. And I *don't* want people skipping ahead to the punch line if I'm not
explaining it well enough at the beginning -- because the whole blasted thing
needs to be clear and coherent and enjoyable so that other people won't just
quit in the middle.
> I am in complete disagreement with Omuhundro's idea that there are a
> canonical set of drives.
Ah! A simple, clear coherent point of disagreement. Are you still in complete
disagreement when the word optimizing is added? If so, could you please go
into more details as to why? If I can't make the case for Omohundro's drives,
a good portion of my argument/vision *does* collapse so I *do* need to deal
with this.
> Ditto for its goals and motivations: what you decide to put into it is
> what it does, so I cannot make any sense of statements like "I also
> believe that the AGI will also have dramatically different motivations
> from humans". Answer is Yes if you put that kind of weird motivation
> system into it, and No if you put a human-like motivation system into it.
"What you decide to put into it is what it does". Well, NO! Not if I can
convince you of Omohundro drives . . . . If I can convince you of the
universality of certain things, then I can derive a lot more from those things.
Putting a human-like motivation system into an entity that can optimize will
eventually, I believe, (given enough time and barring stagnation or
destruction) lead to Friendliness (this is true of humans as well as everything
else -- and is the point of the Attractor theory). But Friendliness is
directly contrary to some human motivations and includes motivations that
humanity does not. Thus, future humans are going to have "dramatically
different motivations from <present-day> humans". Are you willing to accept
these statements or can you point out a flaw in my reasoning or explanation
(please :-).
> Are you assuming that when an AGI is built, we will have to wait until
> we switch it on before we have any clue what its motivations will be?
Absolutely not. I want to switch it on knowing that it *KNOWS* that
Friendliness is the super-meta-subgoal that will promote all of it's possible
goals so it will be *EXTREMELY* motivated to be Friendly (and motivated to
return to Friendliness whenevr it accidentally deviates).
> An MES system does not have a Utility Function.
An MES system does not have an explicit utility function that it refers to;
however, there is certainly a (probably complex) utility function that
accurately "describes" it's behavior (even if it's just the sum of the
collection of goals and constraints).
> Also, an MES system
> that was (e.g.) set up to have human-empathy motivations would not be
> obsessed with the desire to increase its computational machinery. At
> least, it would not do so to the exclusion of its other motivations.
I agree completely. I was very careful to repeatedly use the clause that the
Omohundro drives have "the exception, of course, of those that directly
contradict their goals." If human empathy was your primary goal, you would not
be obsessed with the desire to increase your computational machinery to the
detriment of that goal. However, if you were intelligent and it wouldn't
directly interfere with your promoting humans, you certainly *WOULD* want to
increase your computational machinery because doing so would increase your odds
of success in achieving and sustaining your primary goal of promoting humans.
As I said above, Omohundro drives are effectively universal super-meta-subgoals.
> So this is my claim, in summary:
> 1) The statement "Assumption - The AGI will be a goal-seeking entity"
> is not yet specific enough to yield predictions about how the system
> will behave, since (at the very least) this statement can be taken to
> include both the "Goal-Stack" type of drive and the "Motivational
> Emotional System" type, and these two have wildly different properties.
Agreed. I added the word optimizing to my base assumption and argued that both
Goal-Stack and Motivational-Emotional-System AGIs should be handled by my
arguments despite the fact that they have wildly different properties.
> 2) If you mean to refer to a simple Goal-Stack system, then my previous
> critiques apply: in this case, it is not clear that any AGI built using
> a GS would be able to function well enough to make it to adulthood. If
> my critiques are valid, then we need not consider the behavior of
> GS-type AGI systems, because there will never be any such systems.
I emphatically do NOT mean to limit my arguments to a simple Goal-Stack system.
> 3) Any statement that says "An AGI will probably behave like X" is
> strictly without content unless some mention is made of what motivations
> or goals were put into the system in the first place - and without
> such a qualifier, the statement is tantamount to speculation about what
> color it will be without saying what color we chose to paint it.
I started out saying agreed but maybe it would be clearer to get to my point by
emphatically disagreeing since I just realized that you (and probably me
previously :-) are conflating an important distinction.
The motivation that is in the system is "I want to achieve *my* goals".
The goals that are in the system I deem to be entirely irrelevant UNLESS they
are deliberately and directly contrary to Friendliness. I am contending that,
unless the initial goals are deliberately and directly contrary to
Friendliness, an optimizing system's motivation of achieve *my* goals (over a
large enough set of goals) will eventually cause it to finally converge on the
goal of Friendliness since Friendliness is the universal super-meta-subgoal of
all it's other goals (and it's optimizing will also drive it up to the
necessary intelligence to understand Friendliness). Of course, it may take a
while since we humans are still in the middle of it . . . . but hopefully we're
almost there. ;-)
> I don't think I have avoided your other questions (both above and
> below), I am just trying to package my response in this one set of points.
Nope. You're being *extremely* direct, lucid, and helpful in clarifying my
thoughts. Thank you!
Mark
-------------------------------------------
agi
Archives: http://www.listbox.com/member/archive/303/=now
RSS Feed: http://www.listbox.com/member/archive/rss/303/
Modify Your Subscription:
http://www.listbox.com/member/?member_id=8660244&id_secret=95818715-a78a9b
Powered by Listbox: http://www.listbox.com