Mark Waser wrote:
I am in sympathy with some aspects of Mark's position, but I also see
a serious problem running through the whole debate: everyone is
making statements based on unstated assumptions about the motivations
of AGI systems.
Bummer. I thought that I had been clearer about my assumptions. Let me
try to concisely point them out again and see if you can show me where I
have additional assumptions that I'm not aware that I'm making (which I
would appreciate very much).
Assumption - The AGI will be a goal-seeking entity.
And I think that is it. :-)
Okay, I can use that as an illustration of what I am getting at.
There are two main things.
One is that the statement "The AGI will be a goal-seeking entity" has
many different interpretations, ad I am arguing that these different
interpretations have a massive impact on what kind of behavior you can
expect to see.
It is almost impossible to list all the different interpretations, but
two of the more extreme variants are the two that I have described
before: a "Goal-Stack" system in which the goals are represented in the
same form as the knowledge that the system stores, and a "Motivational
Emotional System" which biasses the functioning of the system and is
intimately connected with the development of its knowledge. The GS
system has the dangerous feature that any old fool could go in and
rewrite the top level goal so it reads "make as much computronium as
possible" or "cultivate dandelions" or "learn how to do crochet". The
MES system, on the other hand, can be set up to have values such as ours
and to feel empathy with human beings, and once set up that way you
would have to re-grow the system before you could get it to have some
other set of values.
Clearly, these two interpretations of "The AGI will be a goal-seeking
entity" have such different properties that, unless there is detailed
clarification of what the meaning is, we cannot continue to discuss what
they would do.
My second point is that some possible choices of the meaning of "The AGI
will be a goal-seeking entity" will actually not cash out into a
coherent machine design, so we would be wasting our time if we
considered how that kind of AGI would behave.
In particular, there are severe doubts about whether the Goal-Stack type
of system can ever make it up to the level of a full intelligence. I'll
go one further on that: I think that one of the main reasons we have
trouble getting AI systems to be AGI is precisely because we have not
yet realised that they need to be driven by something more than a Goal
Stack. It is not the only reason, but its a big one.
So the message is: we need to know exactly details of the AGI's
motivation system ("The AGI will be a goal-seeking entity" is not
specific enough), and we need to then be sure that the details we give
are going to lead to a type of AGI that can actually be an AGI.
These questions, I think, are the real battleground.
BTW, this is not a direct attack on what you were saying, because I
believe that there is a version of what you are saying (about an
intrinsic tendency toward a Friendliness attractor) that I agree with.
My problem is that so much of the current discussion is tangled up with
hidden assumptions that I think that the interesting part of your
message is getting lost.
EVERYTHING depends on what assumptions you make, and yet each voice in
this debate is talking as if their own assumption can be taken for
granted.
I agree with you and am really trying to avoid this. I will address
your specific examples below and would appreciate any others that you
can point out.
The three most common of these assumptions are:
1) That it will have the same motivations as humans, but with a
tendency toward the worst that we show.
I don't believe that I'm doing this. I believe that all goal-seeking
generally tends to be optimized by certain behaviors (the Omohundro
drives). I believe that humans show many of these behaviors because
these behaviors are relatively optimal in relation to the alternatives
(and because humans are relatively optimal). But I also believe that
the AGI will also have dramatically different motivations from humans
where the human motivations were evolved stepping stones that were on
the necessary and optimal path for one environment but haven't been
eliminated now that they are unnecessary and sub-optimal in the current
environment/society (Richard's "the worst that we show").
I am in complete disagreement with Omuhundro's idea that there are a
canonical set of drives.
This is like saying that there is a canonical set of colors that AGIs
will come in: Cambridge Blue, Lemon Yellow and True Black.
What color the thing is will be what color you decide to paint it!
Ditto for its goals and motivations: what you decide to put into it is
what it does, so I cannot make any sense of statements like "I also
believe that the AGI will also have dramatically different motivations
from humans". Answer is Yes if you put that kind of weird motivation
system into it, and No if you put a human-like motivation system into it.
Are you assuming that when an AGI is built, we will have to wait until
we switch it on before we have any clue what its motivations will be?
2) That it will have some kind of "Gotta Optimize My Utility
Function" motivation.
I agree with the statement but I believe that it is a logical follow-on
to my assumption that the AGI is a goal-seeking entity (i.e. it's an
Omohundro drive). Would you agree, Richard?
3) That it will have an intrinsic urge to increase the power of its
own computational machinery.
Again, I agree with the statement but I believe that it is a logical
follow-on to my single initial assumption (i.e. it's another Omohundro
drive). Wouldn't you agree?
Well, on both counts, not really.
An MES system does not have a Utility Function. Also, an MES system
that was (e.g.) set up to have human-empathy motivations would not be
obsessed with the desire to increase its computational machinery. At
least, it would not do so to the exclusion of its other motivations.
There are other assumptions, but these seem to be the big three.
And I would love to go through all of them, actually (or debate one of
my answers above).
There may be a misunderstanding about why I listed them: I really just
wanted to give examples of assumptions whose consequences have not been
explicitly examined.
My above discussion of how the assumptions can have wildly diverging
consequences is probably enough of a debate-starter to be going on with.
So this is my claim, in summary:
1) The statement "Assumption - The AGI will be a goal-seeking entity"
is not yet specific enough to yield predictions about how the system
will behave, since (at the very least) this statement can be taken to
include both the "Goal-Stack" type of drive and the "Motivational
Emotional System" type, and these two have wildly different properties.
2) If you mean to refer to a simple Goal-Stack system, then my previous
critiques apply: in this case, it is not clear that any AGI built using
a GS would be able to function well enough to make it to adulthood. If
my critiques are valid, then we need not consider the behavior of
GS-type AGI systems, because there will never be any such systems.
3) Any statement that says "An AGI will probably behave like X" is
strictly without content unless some mention is made of what motivations
or goals were put into the system in the first place - and without
such a qualifier, the statement is tantamount to speculation about what
color it will be without saying what color we chose to paint it.
Does this make sense?
I don't think I have avoided your other questions (both above and
below), I am just trying to package my response in this one set of points.
Richard Loosemore
So what I hear is a series of statements <snip> (Except, of course,
that nobody is actually coming right out and saying what color of AGI
they assume.)
I thought that I pretty explicitly was . . . . :-(
In the past I have argued strenuously that (a) you cannot divorce a
discussion of friendliness from a discussion of what design of AGI you
are talking about,
And I have reached the conclusion that you are somewhat incorrect. I
believe that goal-seeking entities OF ANY DESIGN of sufficient
intelligence (goal-achieving ability) will see an attractor in my
particular vision of Friendliness (which I'm deriving by *assuming* the
attractor and working backwards from there -- which I guess you could
call a second assumption if you *really* had to ;-).
and (b) some assumptions about AGI motivation are extremely incoherent.
If you perceive me as incoherent, please point out where. My primary
AGI motivation is "self-interest" (defined as achievement of *MY* goals
-- which directly derives from my assumption that "the AGI will be a
goal-seeking entity"). All other motivations are clearly logically
derived from that primary motivation. If you see an example where this
doesn't appear to be the case, *please* flag it for me (since I need to
fix it :-).
And yet in spite of all my efforts that I have made, there seems to be
no acknowledgement of the importance of these two points.
I think that I've acknowledged both in the past and will continue to do
so (despite the fact that I am now somewhat debating the first point --
more the letter than the spirit :-).
-------------------------------------------
agi
Archives: http://www.listbox.com/member/archive/303/=now
RSS Feed: http://www.listbox.com/member/archive/rss/303/
Modify Your Subscription:
http://www.listbox.com/member/?member_id=8660244&id_secret=95818715-a78a9b
Powered by Listbox: http://www.listbox.com