Re: [agi] Motivational Systems that are stable

Richard Loosemore Sat, 28 Oct 2006 18:16:55 -0700

Hank Conn wrote:

Although I understand, in vague terms, what idea Richard is attemptingto express, I don't see why having "massive numbers of weak constraints"or "large numbers of connections from [the] motivational system to[the] thinking system." gives any more reason to believe it is reliablyFriendly (without any further specification of the actual processes)than one with "few numbers of strong constraints" or "a small number ofconnections between the motivational system and the thinking system".The Friendliness of the system would still depend just as strongly onthe actual meaning of the connections and constraints, regardless oftheir number, and just giving an analogy to an extremely reliablenon-determinate system (Ideal Gas) does nothing to explain how you aregoing to replicate this in the motivational system of an AGI.-hank


Hank,

There are three things in my proposal that can be separated, and perhapsit will help clear things up a little if I explicitly distinguish them.

The first is a general principle about "stability" in the abstract,while the second is about the particular way in which I see amotivational system being constructed so that it is stable. The thirdis how we take a stable motivational system and ensure it isStable+Friendly, not just Stable.

About stability in the abstract. A system can be governed, in general,by two different types of mechanism (they are really two extremes on acontinuum, but that is not important): the fragile, deterministic type,and the massively parallel weak constraints type. A good example of thefragile deterministic type would be a set of instructions for getting toa particular place in a city which consisted of a sequence of stepsyou should take, along named roads, from a special starting point. Anexample of a weak constraint version of the same thing would be to givea large set of clues to the position of the place (near a pond, near alibrary, in an area where Dickens used to live, opposite a house paintedblue, near a small school, etc.).

The difference between these two approaches would be the effects of adisturbance on the system (errors, or whatever): the fragile one onlyneeds to be a few steps out and the whole thing breaks down, whereas themultiple constraints version can have enormous amounts of noise in itand still be extremely accurate. (You could look on the TwentyQuestions game in the same way: 20 vague questions can serve to pindown most objects in the world of our experience).

What is the significance of this? Goal systems in conventional AI havean inherent tendency to belong to the fragile/deterministic class. Whydoes this matter? Because it would take very little for the AI systemto change from its initial design (with friendliness built into itssupergoal) to one in which the friendliness no longer dominated. Thereare various ways that this could happen, but the one most important, formy purposes, is where the interpretation of "Be Friendly" (or howeverthe supergoal is worded) starts to depend on interpretation on the partof the AI, and the interpretation starts to get distorted. You know thekind of scenario that people come up with: the AI is told to befriendly, but it eventually decides that because people are unhappy muchof the time, the only logical way to stop all the unhappiness is toeliminate all the people. Something stupid like that. If you traceback the reasons why the AI could have come to such a dumb conclusion,you eventually realize that it is because the motivation system was sofragile that it was sensitive to very, very small perturbations -basically, one wrong turn in the logic and the result could beabsolutely anything. (In much the same way that one small wrong step orone unanticipated piece of road construction could ruin a set ofdirections that told you how to get to a place by specifying that you go251 steps east on Oxford Street, then 489 steps north on.... etc.).

The more you look at those conventional goal systems, the more they lookfragile. I cannot give all the arguments here because they areextensive, so maybe you can take my word for it. This is one reason(though not the only one) why efforts to mathematically prove thevalidity of one of those goal systems under recursive self-improvementis just a complete joke.

Now, what I have tried to argue is that there are other ways to ensurethe stability of a system: the multiple weak constraints idea is whatwas behind my original mention of an Ideal Gas. The P, V and T of anIdeal Gas are the result of many constraints (the random movements ofvast numbers of constituent particles), and as a result the P, V and Tare exquisitely predictable.

The question becomes: can you control/motivate the behavior of an AIusing *some* variety of motivational system that belongs in the "massivenumbers of weak constraints" category? If you could find any way ofdoing this you could perhaps make the thing as reliable as an Ideal Gas.

So that is Part One of the two part argument: if you can find amotivational system of that sort, you are (almost) home and dry. Myinitial claim is that this is possible in principle.




Part two of the argument is then about *how* to do that.

To understand how this is feasible, imagine an AI system in which the"concepts" (basic units of knowledge) grow from primitive seed concepts(like Momma, Food, Drink, Warmth, Friends, Play, Curiosity .... ). Atthe beginning the primitive seeds are coupled together by an equallyprimitive motivational system that basically hardwires the ways that thesystem can get pleasure.

What happens after that is that new concepts develop in abundance.Glossing over an immensely complex process, we can note two things::

1) The system is not governed by a strict goal stack, it just improvisesthe things it "wants" to do within broad parameters defined by the goalsystem. What this means is that there is an attentional focus: a setof concepts and models that are currently most active and that definewhat the system is thinking about and trying to do at any given moment.This attentional focus wanders around and occasionally gets pushed inone direction or another by the motivational system. The motivationalsystem is not smart (no knowledge base there) it just pulls things inone direction or another.

2) The way that it pulls things around depends on two different kinds ofconnections in the system. The concepts are mutually linked becausethey represent the learned structure of the world. But quite separatefrom these regular links there is another set that grow from theprimitive concepts like branches on a tree: these cross-connect thesystem, so that every concept is linked back, ultimately, to primitives.

Now, one set of primitives are the primitive motivations and the plansand actions the system can take in order to satisfy those primitives.As time goes on, the sophistication of those plans can become immense,but they still connect back. For example, in childhood, if you werecold you would snuggle up to a warm object, but as an adult you plan forcold situations by getting a warm house (and you plan to earn money topay for the house and for fuel, etc. etc.). When you look at all thethings we know how to do, you find that all of those concepts (plans,schemas, models, etc) are linked back to primitive motivations, throughintermediaries.

What this means, in practice, is that in the adult, the motivationalsystem has its fingers in all of the stuff that goes on in the thinkingpart of the system, because of the historical development of that stuff.

Further, when the system tries to decide what to do, it considerscandidate plans and runs them through its internal simulator to seewhether they are consistent with its primitive motivational goals (inother words, it thinks about and has a gut feeling about whether aparticular proposed behavior is really consistent with its motivationalgoals. If, at any stage, this modelling turns up any inconsistencies,alarm bells go off and the system is forced to confront the problem(you'll recall the examople I gave originally about a person who wantedto sell their mother for money, and the consequences this had insidetheir thinking system).

Now, if you look at the overall design of this motivational system youwill notice that it does not look at all like a traditional AI goalstack (aka planning system), and that it does look like there aremassive numbers of constraints at work.

There are constraints in at least two places: the motivational tree ofconnections that reach out into the concept space, and the check andbalance mechanisms that cause plans to be tested against evolvingcriteria. Each of these involve such huge numbers of constraints thatit is very hard to shake the system away from the mindset that it has,once it is fully grown.

This, I claim, is the basic outline of the way in which stability isachieved in a motivational system.

Finally, there is the question of why such a system would be "friendly"at all. How to define friendly inside the system, in such a way thatthe system is actually doing things that seem to us to be friendly?After all, it might be utterly stable, but we need stable+(does what wewant).

Not easy, but I note that there is an ideal type of human behavior thatis of the sort we desire. We can all imagine a "saintlike" human who isextraordinarily fair and reasonable. I believe that that archetype canactually be understood in terms of the balance of the variousmotivational primitives inside them, and the way they developed. Thatis where out research efforts should be directed: toward understandingexactly how such a motivational system is structured.

All I can do on that point is to say that I see it as being possible,and I see no arguments being advanced as to why that should be animpossible goal for us to achieve.

At the end of the day, what we could do is to give the system the purestversion of that "saintlike" motivational system (NOT a mother-hen, Nannytype, I hasten to add, because I know that some people always interpretit that way), and then accept that, hey, it is going to have troublewith some of the same moral and ethical dilemmas that we have troublewith. But even though it, and we, will have those dilemmas, it will beno more unpredictable and alien in its thoughts than an extremely(1000x) peaceful, benign version of Mahatma Ghandi (and without some ofthe motivational flaws that even Ghandi possessed).

What I have done is to show a way to get there. A complete solution, inprinciple, to the Friendliness Problem, in which only the fine detailsremain. Perhaps there is a showstopper in the fine details. I don'tbelieve so, but if anyone else thinks there is, they need to bespecific, and not just say that "it doesn't look like it will work".



Hope that helps.




Richard Loosemore








-----
This list is sponsored by AGIRI: http://www.agiri.org/email
To unsubscribe or change your options, please go to:
http://v2.listbox.com/member/[EMAIL PROTECTED]

Re: [agi] Motivational Systems that are stable

Reply via email to