Re: [agi] Goal Driven Systems and AI Dangers [WAS Re: Singularity Outcomes...]

Mark Waser Fri, 23 May 2008 11:45:04 -0700

he makes a direct reference to goal driven systems, but even more
important he declares that these bad behaviors will *not* be the result
of us programming the behaviors in at the start .... but in an MES
system nothing at all will happen unless the designer makes an explicit
decision to put some motivations into the system, so I can be pretty
sure that he has not considered that type of motivational system when he
makes these comments.


Richard, I think that you are incorrect here.

When Omohundro says that the bad behaviors will *not* be the result of usprogramming the behaviors in at the start, what he means is that the veryfact of having goals or motivations and being self-improving will naturallylead (**regardless of architecture**) to certain (what I call generic)sub-goals (like the acquisition of power/money, self-preservation, etc.) andthat the fulfillment of those subgoals, without other considerations (likeethics or common-sense), will result in what we would consider bad behavior.

I believe that he is correct in that goals or motivations andself-improvement will lead to generic subgoals regardless of architecture.Do you believe that your MES will not derive generic subgoals underself-improvement?

Omohundro's arguments aren't *meant* to apply to an MES system withoutmotivations -- because such a system can't be considered to have goals. Hisarguments will start to apply as soon as the MES system does havemotivations/goals. (Though, I hasten to add that I believe that his logicalreasoning is flawed in that there are some drives that he missed that willprevent such bad behavior in any sufficiently advanced system).

----- Original Message -----From: "Richard Loosemore" <[EMAIL PROTECTED]>

To: <agi@v2.listbox.com>
Sent: Friday, May 23, 2008 2:13 PM

Subject: Re: [agi] Goal Driven Systems and AI Dangers [WAS Re: SingularityOutcomes...]

Kaj Sotala wrote:

Richard,
again, I must sincerely apologize for responding to this so horrendouslylate. It's a dreadful bad habit of mine: I get an e-mail(or blog comment, or forum message, or whatever) that requires somethought before I respond, so I don't answer it right away... and then
 something related to my studies or hobbies shows up and doesn't
leave me with enough energy to compose responses to anybody at all,
after which enough time has passed that the message has vanished from
my active memory, and when I remember it so much time has passed
already that a day or two more before I answer won't make any
difference... and then *so* much time has passed that replying to the
message so late feels more embarassing than just quietly forgetting
about it.
I'll try to better my ways in the future. On the same token, I must say Ican only admire your ability to compose long, well-written replies tomessages in what seem to be blinks of an eye to me. :-)


Hey, no problem ..... you'll notice that I am pretty late getting back
this time :-) ..... got too many things to keep up with here.

In the spirit of our attempt to create the longest-indented discussion
in the universe, I have left all the original text in and inserted my
responses appropriately...

On 3/11/08, Richard Loosemore <[EMAIL PROTECTED]> wrote:

Kaj Sotala wrote:

On 3/3/08, Richard Loosemore <[EMAIL PROTECTED]> wrote:

Kaj Sotala wrote:

Alright. But previously, you said that Omohundro's paper,
which to me seemed to be a general analysis of the behavior
of *any* minds with (more or less) explict goals, looked like
it was based on a 'goal-stack' motivation system. (I believe
this has also been the basis of your critique for e.g. some
SIAI articles about friendliness.) If built-in goals *can* be
constructed into motivational system AGIs, then why do you
seem to assume that AGIs with built-in goals are goal-stack
ones?



I seem to have caused lots of confusion earlier on in the
discussion, so let me backtrack and try to summarize the
structure of my argument.

1)  Conventional AI does not have a concept of a

"Motivational-Emotional

System" (MES), the way that I use that term, so when I
criticised Omuhundro's paper for referring only to a "Goal
Stack" control system,

was really saying no more than that he was assuming that the AI
was driven by the system that all conventional AIs are supposed
to have. These two ways of controlling an AI are two radically
different

designs.

[...]

So now:  does that clarify the specific question you asked
above?

Yes and no. :-) My main question is with part 1 of your argument
- you are saying that Omohundro's paper assumed the AI to have a
certain sort of control system. This is the part which confuses
me, since I didn't see the paper to make *any* mentions of how
the AI should be built. It only assumes that the AI has some sort
of goals, and nothing more.

[...]

Drive 1: AIs will want to self-improve This one seems fairly
straightforward: indeed, for humans self-improvement seems to be
an essential part in achieving pretty much *any* goal you are not
immeaditly capable of achieving. If you don't know how to do
something needed to achieve your goal, you practice, and when you
practice, you're improving yourself. Likewise, improving yourself
will quickly become a subgoal for *any* major goals.

But now I ask:  what exactly does this mean?

In the context of a Goal Stack system, this would be represented by
a top level goal that was stated in the knowledge representation
language of the AGI, so it would say "Improve Thyself".

[...]

The reason that I say Omuhundro is assuming a Goal Stack system is
that I believe he would argue that that is what he meant, and that
he assumed that a GS architecture would allow the AI to exhibit
behavior that corresponds to what we, as humans, recognize as
wanting to self-improve.  I think it is a hidden assumption in what
he wrote.

At least I didn't read the paper in such a way - after all, the abstractsays that it's supposed to apply equally to all AGI systems,

 regardless of the exact design:

"We identify a number of "drives" that will appear in sufficientlyadvanced AI systems of any design. We call them drives because they aretendencies which will be present unless explicitly counteracted."

(You could, of course, suppose that the author was assuming that an AGIcould *only* be built around a Goal Stack system, and therefore "anydesign" would mean "any GS design"... but that seems a bit far-fetched.)


Oh, I don't think that would be far-fetched, because most AI people have
not even begun to think about how to control an AI/AGI system, so they
always just go for the default.  And the default is a goal-stack system.

I have not yet published my work on MES systems, so Omuhundro would
probably not know of that.

I did notice his claim that his 'drives' are completely general, and I
found that amusing, because it does not cover the cases that I envisaged.

If he is making the very mild claim that intelligent systems are driven
by "drives" like hunger, curiosity, self-preservation, etc.  ---  in
other words, if he is using the word "drive" to just label the content
of a drive, rather than the mechanism that does the actual driving, then
his comments would be fairly trvial, no?  He would just be saying that
such things as hunger, self-preservation, curiosity, etc. are typically
lying somewhere in the background, and since that observation has been
in folk psychology forever, and has been a big part of motivation
research in psychology for almost ever, it would not be interesting if
he said that those things exist.  If he made such a general statement,
it would of course be true that his comments were applicable to any time
of mechanism - my MES proposal, the usual Goal Stack ideas, or whatever.

But since that would be such a trivial claim, I took him to be saying
something more, which was that a "drive" was a particular kind of
mechanism.  IIRC, his discussion proceeded AS IF he was then talking
about mechanisms, and the mechanisms in question looked like goals at
the top of a goal stack.  It was because of that immediate jump to a
goal-stack assumption that I made my criticism.

For example, in the opening paragraph he says:

"Without special precautions, [a robot] will resist being turned off,
will try to break into other machines and make copies of itself, and
will try to acquire resources without regard for anyone else’s safety.
These potentially harmful behaviors will occur not because they
were programmed in at the start, but because of the intrinsic nature of
goal driven systems."

he makes a direct reference to goal driven systems, but even more
important he declares that these bad behaviors will *not* be the result
of us programming the behaviors in at the start .... but in an MES
system nothing at all will happen unless the designer makes an explicit
decision to put some motivations into the system, so I can be pretty
sure that he has not considered that type of motivational system when he
makes these comments.

Drive 2: AIs will want to be rational This is basically just a
special case of drive #1: rational agents accomplish their goals
better than irrational ones, and attempts at self-improvement can
be outright harmful if you're irrational in the way that you try
to improve yourself. If you're trying to modify yourself to

better achieve your goals, then you need to make clear to yourself whatyour goals are. The most effective method for this

is to model your goals as a utility function and then modify
yourself to better carry out the goals thus specified.

Well, again, what exactly do you mean by "rational"?  There are
many meanings of this term, ranging from "generally sensible" to
"strictly following a mathematical logic".

Rational agents accomplish their goals better than irrational ones?
Can this be proved?  And with what assumptions?  Which goals are
better accomplished .... is the goal of "being rational" better
accomplished by "being rational"?  Is the goal of "generating a
work of art that has true genuineness" something that needs
rationality?

And if a system is trying to modify itself to better achieve its
goals, what if it decides that just enjoying the subjective
experience of life is good enough as a goal, and then realizes that
it will not get more of that by becoming more rational?

Most of these questions are rhetorical (whoops, too late to say
that!), but my general point is that the actual behavior that
results from a goal like "Be rational" depends (again) on the exact
interpretation, and in the right kind of MES system there is no
*absolute* law at work that says that everything the creature does
must be perfectly or maximally rational.  The only time you get
that kind of absolute obedience to a principle of rationality is in
a GS type of AGI.

Despite the fact that they were rhetorical questions, I do feel likepointing out that Omohundro actually defined rational in his paper. :-)

"So we'll assume that these systems will try to self-improve. What kindsof changes will they make to themselves? Because they are goal directed,they will try to change themselves to better meet their goals in thefuture. [...] From its current perspective, it would be

a disaster if a future version of itself made self modifications that

worked against its current goals. So how can it ensure that futureself-modifications will accomplish its current objectives? For one thing,it has to make those objectives clear to itself. [...] One way

 to evaluate an uncertain outcome is to give it a weight equal to its
 expected utility (the average of the utility of each possible
outcome weighted by its probability). The remarkable 'expected
utility' theorem of microeconomics says that it is always possible
for a system to represent its preferences by the expectation of a
utility function unless the system has 'vulnerabilities' which cause
it to lose resources without benefit [1]. Economists describe systems
that act to maximize their expected utilities as 'rational economic
agents'."


Oh, yes, that is more or less the standard definition of 'rational
agent' (that it has a utility function and it tries to maximize its
expected utility).

The problem is that, with this definition of 'rational', a 'rational
agent' does not have to be at all 'rational' according to the colloquial
usage of that term.  For example, it can have a utility function that
says "make people happy" in a maximal way, and then, after going through
a tortuous process of reasoning to figure out how to do this, it could
decide to put every person on the planet onto a valium drip.  If
'rational' is defined to be behaving according to its utility function,
then this agent is rational, but according to the ordinary usage of the
word "rational" we would call it a stark raving lunatic, and ask for the
machine to be switched off.  What has happened here is that the word

"rational" has been co-opted to mean something other than the regularusage.


I guess I was trying to question the real colloquial meaning of
'rational', in my sequence of rhetorical questions.  Having a standard
'rational agent' according to the above definition seems to be the worst
possible choice.

So, if Omunhundro meant to include MES-driven AGIs in his
assumptions, then I see no deductions that can be made from the
idea that the AGI will want to be more rational, because in an
MES-driven AGI the tendency toward rationaliity is just a tendency,
and it the behavior of the system would certanly not be forced
toward maximum rationality.


Yes, you're definitely right in that some of the drives that
Omohundro speaks about will be less likely to manifest themselves in
MES-driven AGIs with certain architectures. But I don't think that's
an objection towards the paper per se: just because some of the
tendencies are weaker in some systems doesn't mean they won't appear
at all. (Silly analogue: birds and hot-air balloons are less affected
by gravity than your average T-Rex, but that doesn't mean they're
immune.) I find the paper valuable and insightful simply because it
presents tendencies that manifest themselves in all useful AI systems
*at all* - certainly they're weaker in some systems, but analyzing
which drives are the strongest in which sorts of architectures would
be a separate paper (or more likely, several). It's also useful to be
aware of them when designing AGI architectures - one might want to
design their system in such a way as to minimize or maximize the
impact of specific drives. (People have been talking about
Friendliness theory for a long time, but I'd say this is one of the
first papers actually contributing something practically useful to
that field...)


Well, bear in mind what I said above:  if he is saying that some

"drives" of some kind will exist, then he makes a trivial claim. But ifmakes stronger claims that depend on the assumption of a GS system - and Ithink I have shown that he does - then his claims are questionable.

For example, that opening statement I quoted above is completely untrue ofMES systems! So his very first claim turns out to be invalid: robotswould not inevitably do such things, regardless of how they were built.

It is because of that level of uselessness that I regard the paper as sucha non-contribution to the debate. It seems to me to start from a wrongassumption and then proceed to derive some wrong conclusions ... adisaster, if what we want is clarity and progress.

Do you think he derives any claims that are independent of the GSassumption? I cannot see any, but I am open to the possibility.

Drive 3: AIs will want to preserve their utility functions Since

the utility function constructed was a model of the AI's goals, thisdrive is equivalent to saying "AIs will want to preserve

their goals" (or at least the goals that are judged as the most
important ones). The reasoning for this should be obvious - if a
goal is removed from the AI's motivational system, the AI won't
work to achieve the goal anymore, which is bad from the point of
view of an AI that currently does want the goal to be achieved.

This is, I believe, only true of a rigidly deterministic GS system,
but I can demonstrate easily enough that it is not true of at least
on etype of MES system.

Here is the demonstration (I originally made this argument when I
first arrived on the SL4 list a couple of years ago, and I do
wonder if it was one of the reasons why some of the people there
took an instant dislike to me). I, as a human being, and driven by
goals which include my sexuality, and part of that, for me, is the
drive to be heterosexual only.  In real life I have no desire to
cross party lines:  no judgement implied, it just happens to be the
way I am wired.

However, as an AGI researcher, I *know* that I would be able to
rewire myself at some point in the future so that I would actually
break this taboo.  Knowing this, would I do it, perhaps as an
experiment?  Well, as the me of today, I don't want to do that, but
I am aware that the me of tomorrow (after the rewiring) would be
perfectly happy about it. Knowing that my drives today contain a
zero desire to cross gender lines is one thing, but in spite of
that I might be happy to switch my wiring so that I *did* enjoy it.


This means that by intellectual force I have been able to at least
consider the possibilty of changing my drive system to like
something, today, I absolutely do not want.  I know it would do not
harm, so it is open as a possibility.

Good example - part of why it took me so long to answer this e-mail wasbecause I was trying to come up with a counter-example, or an alternativeexplanation. The closest that I got was to suggest that, in GS terms, youonly have "be only heterosexual" as a subgoal ofsome higher principle, not as a goal as itself. Now obviously thesuper/subgoal terminology is based on GS architectures, and as such youmight be right in that this drive only applies for GS systems... but onthe other hand, a MES-based AI would also have things that it consideredmore important than others, so I'm not sure if an

analogous reasoning might not apply for them. But I still don't find
that answer of mine fully satisfying.

Yes, I share your concern that this example might work because, in somesense, the sexuality drive is more mutable than some others. For example,if I were an AGI I would not deliberately experiment with the motivationthat we would call paranoid psychosis, for the simple reason that I knowthis would be immensely dangerous.

Nevertheless, it seems to me that the distinction between *being* acreature driven by a drive, and being a *thinker* that contemplates thepossibility of changing its own drive system, is a fascinating one. And,at the very least, it is not obvious to me that a thinker is beholden toall its drives if it knows that after a drive change it will be, bydefinition, happy with the change!

A concluding thought. In our discussion we have not paid too muchattention to the viability of a Goal-Stack system, but I think this ismost important. Nobody has actually built an AGI (not, not a narrow AI,but a real AGI) that is controlled by a goal stack. Part of the problemis that when you build a narrow AI you can use pretty low-level,non-general goals that unpack quite nicely, but when you try to make asystem that is capable of quite general behavior you are forced toconfront the problem of how to guide its behavior, and that starts to makeyour goal stack have weirdly abstract goals like "Improve thyself".

I believe that because of that tendency for the top-level goals to becomeso abstract as to be meaningless, the whole concept of a Goal-Stack AGIbreaks down completely. This is something that simply does not scale up.Regrettably, my attempts to get a wide range of people to discuss thisissue have met with bizarre forms of hostilty, so instead of making it animportant research topic, I have achieved nothing. This seems like acrazy situation, because people continue to talk about AGI systems as ifthey will be driven by a goal stack, and that makes so many of theirconclusions invalid.



Enough for now.




Richard Loosemore




















-------------------------------------------
agi
Archives: http://www.listbox.com/member/archive/303/=now
RSS Feed: http://www.listbox.com/member/archive/rss/303/

Modify Your Subscription:http://www.listbox.com/member/?&;

Powered by Listbox: http://www.listbox.com





-------------------------------------------
agi
Archives: http://www.listbox.com/member/archive/303/=now
RSS Feed: http://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: 
http://www.listbox.com/member/?member_id=8660244&id_secret=103754539-40ed26
Powered by Listbox: http://www.listbox.com

Re: [agi] Goal Driven Systems and AI Dangers [WAS Re: Singularity Outcomes...]

Reply via email to