Re: [agi] Goal Driven Systems and AI Dangers [WAS Re: Singularity Outcomes...]

Richard Loosemore Sat, 24 May 2008 18:24:59 -0700

Mark Waser wrote:

So if Omuhundro's claim rests on that fact that "being self improving"is part of the AGI's makeup, and that this will cause the AGI to docertain things, develop certain subgoals etc. I say that he hasquietly inserted a *motivation* (or rather assumed it: does he eversay how this is supposed to work?) into the system, and then imaginedsome consequences.
I think that I'm missing something here . . . . Omohundro is*explicitly* assuming self-improving and yes, self-improving is agoal/motivation. What do you believe that this proves/disproves? I'mnot getting your point.


Oh, simply that he cannot make deductions about what the "self
improvement" motivation will actually do, until he has been explicit
about exactly how it is implemented.  In particular, the actual effects
of a self-improvment motivation are different in a goal stack system
versus a motivational-emotional system, and are also different depending
on the strength and type of the self-improvement motivation.

If you look at his paper carefully, you will see that at every step of
the way he introduces assumptions as if they were obvious facts ... and
in all the cases I have bothered to think through, these all stem from
the fact that he has a particular kind of mechanism in mind (one which
has a goal stack and a utility function).  There are so many of these
assertions pulled out of think air that I found it gave me a headache
just to read the paper.

"Self-improvement" is not a self-evident concept;  it is not something
that has a simple, unanalysable, a-priori clarity to it.  We cannot just
say that the AGI does "self-improvment" without saying how it goes about
doing this.  Omohundro, for all that he appears to be thinking about the
topic deeply, is actually doing a sleight-of-hand job here .... he
assumes a certain style of AGI design, then he pulls out a number of
assertions about various aspects of self-improvement without stopping to
clearly justify where these come from.


I guess I need to pick an example to make this clear.

He says that AGIs will generally want to improve the way that they
achieve their goals, and this is correct so long as we understand it to
be a general tendency.  But then he points out that self-modifying their
goal systems can have disastrous effects (again, true in principle), and
he speculates about how we should try to minimize the risks of
self-modification:


"If we wanted to prevent a system from improving itself, couldnt we
just lock up its hardware and not tell it how to access its own machine
code? For an intelligent system, impediments like these just become
problems to solve in the process of meeting its goals.   If the payoff
is great enough, a system will go to great lengths to accomplish an
outcome. If the runtime environment of the system does not allow it to
modify its own machine code, it will be motivated to break the
protection mechanisms of that runtime."


In order to understand how much this paragraph is filled with unexamined
assumptions, consider two possible AGI systems, A and B.

System A has a motivational structure that includes some desire to
improve itself, along with some empathy for the human species and some
strong motivations not to do anything dangerous.  It balances these
three factors in such a way that it fully understands the dangers of
self-modification of its motivational system, and while it would, in
general, like to do some self-improvement, it also understands that the
locks that the humans have inserted are there for its own, and the
humans' protection, and it so the urge to try to crack those locks is
virtually non-existent.

System B is motivated to improve its goal system, and improving its
motivational system is part of that quest, so it regards the locks that
the humans have put on it as just an obstruction.  Further, it is
strongly motivated to try to solve difficult challenges more than simple
challenges, so the locks represent a particularly appealing target.

Now, System B will do all of the things that Omohundro suggests in the
above passage, but System A will not do any of them:  it would be
ridiculous to say that for System A "impediments like these just become
problems to solve in the process of meeting its goals".  System A is
just not that monomaniacally obsessed with self-improvement!  System A
is mature, thoughtful and balanced in its assessment of the situation.
It is cautious, and able to appreciate that there is a tradeoff here.

If you read the rest of the paragraph from which that extract came, you
will see that Omohundro would have us believe that the system goes on to
try to convince or trick humans to make the changes!  As far as he is
concerned, there is no doubt whatsoever that an AGI would *have* to
utterly obsessed with improving itself at all costs.

But this is silly:  where was his examination of the systems various
motives?  Where did he consider the difference between different
implementations of the entire motivational mechanism (my distinction
between GS and MES systems)?  Nowhere.  He just asserts, without
argument, that the system would be obsessed, and that any attempt by us
to put locks on the system would result in "an arms race of measures and
countermeasures."

That is just one example of how he pulls conclusions out of thin air.
The first time I read this paper I found the whole thing too ridiculous
to read after the first few times this happened.  I despair at the
thought that I may have to write a complete demolition of all his
points, at some point in the future.

And, a propos of one of your questions below:  yes, he struck me as
being completely ignorant of the fact that his claim about "self
improvement" was actually about a particular kind of motivation.  After
all, he starts the paper by saying that his arguments do not depend on
what we do when we design the system, but then whenever he gets down to
the nitty gritty and talks about some particular aspect of self
improvement (as in the example I cited above) he shows not the slightest
sign of understanding that he is talking about the effects of one
particular motivation module, whose design was (guess what?!) determined
by the designers of the AGI system!  ;-)

You also ask about the difference between a Goal Stack system and a
Motivational Emotional system (GS vs MES), and how this would bear on
the claims he made.  In my analysis of his above claim, I only made use
of the fact that he assumed certain motivation *content*, rather than
his ignorance of the GS-MES distinction, but it is also just generally
the case that he took the view that the system could be driven by a
simple, potentially obsessive goal mechanism .... namely, the system has
the supergoal of "Maximize a measure of your goal achievement", together
with some functions that actually do the measuring of goal achievement.
 The reason I call this potentially obsessive is that the statements of
the goals are assumed to be simple, and the function for measure
achievement is assumed to be simple, and in that kind of system it is
easy for the system to become locked up in a state where it must
maximize some measure, regardless of the consequences for the rest of
the system.

(You might say that in a GS system the goal statement and the
achievement measure do not *have* to be simple, but I would disagree:
implicit in this type of system is the assumption that goals are clear
and measures can be computed in a clear way.  In fact, if the goal
statement turns into a proposition the size of a book, and the
evaluation function is a massively parallel measure applied
simultaneously to all the components of that book-long description of
what the system is trying to achieve, then - hey presto! - you have now
built an MES system anyway!).

Further, I do not buy the supposed consequences. Me, I have the"self-improving" motivation too. But it is pretty modest, and also itis just one among many, so it does not have the consequences that heattributes to the general existence of the self-improvement motivation.
AS I said in my previous e-mail, I don't buy his consequences either.
My point is that since he did not understand that he was making theassumption,
Excuse me? What makes you believe that he didn't understand that he wasmaking the self-improvement assumption or that it was agoal/motivation? It looked pretty deliberate to me.
and did not realize the role that it could play in a MotivationalEmotional system (as opposed to a Goal Stack system),
OK. So could you describe what role it would play in an MES system asopposed to a Goal Stack System? I don't see a difference in terms ofeffects.
he made a complete dog's dinner of claiming how a future AGI would*necessarily* behave.
This I agree with -- but not because of any sort of differences betweenGS and MES systems. I don't believe that his conclusions apply to anintelligent GS system either.
Only in a Goal Stack system is there a danger of a self-improvementsupergoal going awol.
Why? An MES system requires more failures to have a problem, butcertain types of environment could (and should) cause such a problem.
As far as i can see, his arguments simply do not apply to MES systems:the arguments depend too heavily on the assumption that thearchitecture is a Goal Stack. It is simply that none of what he says*follows* if an MES is used. Just a lot of non-sequiteurs.
I *STILL* don't get this. His arguments depend heavily upon the systemhaving goals/motivations. Yes, his arguments do not apply to an MESsystem without motivations. But they do apply to MES systems withmotivations (although, again, I don't agree with his conclusions).
When an MES system is set up with motivations (instead of being blank)what happens next depends on the mechanics of the system, and theparticular motivations.
YES! But his argument is that to fulfill *any* motivation, there aregeneric submotivations (protect myself, accumulate power, don't let mymotivation get perverted) that will further the search to fulfill yourmotivation.
= = = = =
As a relevant aside, you never answered my question regarding how youbelieved an MES system was different from a system with a *large* numberof goal stacks.

This is true: I am too tired to do this tonight, but I will make a stabat that tomorrow, if I have time.






Richard Loosemore


-------------------------------------------
agi
Archives: http://www.listbox.com/member/archive/303/=now
RSS Feed: http://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: 
http://www.listbox.com/member/?member_id=8660244&id_secret=103754539-40ed26
Powered by Listbox: http://www.listbox.com

Re: [agi] Goal Driven Systems and AI Dangers [WAS Re: Singularity Outcomes...]

Reply via email to