Hank Conn wrote:
Although I understand, in vague terms, what idea Richard is attempting to express, I don't see why having "massive numbers of weak constraints" or "large numbers of connections from [the] motivational system to [the] thinking system." gives any more reason to believe it is reliably Friendly (without any further specification of the actual processes) than one with "few numbers of strong constraints" or "a small number of connections between the motivational system and the thinking system". The Friendliness of the system would still depend just as strongly on the actual meaning of the connections and constraints, regardless of their number, and just giving an analogy to an extremely reliable non-determinate system (Ideal Gas) does nothing to explain how you are going to replicate this in the motivational system of an AGI. -hank

Hank,

There are three things in my proposal that can be separated, and perhaps it will help clear things up a little if I explicitly distinguish them.

The first is a general principle about "stability" in the abstract, while the second is about the particular way in which I see a motivational system being constructed so that it is stable. The third is how we take a stable motivational system and ensure it is Stable+Friendly, not just Stable.



About stability in the abstract. A system can be governed, in general, by two different types of mechanism (they are really two extremes on a continuum, but that is not important): the fragile, deterministic type, and the massively parallel weak constraints type. A good example of the fragile deterministic type would be a set of instructions for getting to a particular place in a city which consisted of a sequence of steps you should take, along named roads, from a special starting point. An example of a weak constraint version of the same thing would be to give a large set of clues to the position of the place (near a pond, near a library, in an area where Dickens used to live, opposite a house painted blue, near a small school, etc.).

The difference between these two approaches would be the effects of a disturbance on the system (errors, or whatever): the fragile one only needs to be a few steps out and the whole thing breaks down, whereas the multiple constraints version can have enormous amounts of noise in it and still be extremely accurate. (You could look on the Twenty Questions game in the same way: 20 vague questions can serve to pin down most objects in the world of our experience).

What is the significance of this? Goal systems in conventional AI have an inherent tendency to belong to the fragile/deterministic class. Why does this matter? Because it would take very little for the AI system to change from its initial design (with friendliness built into its supergoal) to one in which the friendliness no longer dominated. There are various ways that this could happen, but the one most important, for my purposes, is where the interpretation of "Be Friendly" (or however the supergoal is worded) starts to depend on interpretation on the part of the AI, and the interpretation starts to get distorted. You know the kind of scenario that people come up with: the AI is told to be friendly, but it eventually decides that because people are unhappy much of the time, the only logical way to stop all the unhappiness is to eliminate all the people. Something stupid like that. If you trace back the reasons why the AI could have come to such a dumb conclusion, you eventually realize that it is because the motivation system was so fragile that it was sensitive to very, very small perturbations - basically, one wrong turn in the logic and the result could be absolutely anything. (In much the same way that one small wrong step or one unanticipated piece of road construction could ruin a set of directions that told you how to get to a place by specifying that you go 251 steps east on Oxford Street, then 489 steps north on.... etc.).

The more you look at those conventional goal systems, the more they look fragile. I cannot give all the arguments here because they are extensive, so maybe you can take my word for it. This is one reason (though not the only one) why efforts to mathematically prove the validity of one of those goal systems under recursive self-improvement is just a complete joke.

Now, what I have tried to argue is that there are other ways to ensure the stability of a system: the multiple weak constraints idea is what was behind my original mention of an Ideal Gas. The P, V and T of an Ideal Gas are the result of many constraints (the random movements of vast numbers of constituent particles), and as a result the P, V and T are exquisitely predictable.

The question becomes: can you control/motivate the behavior of an AI using *some* variety of motivational system that belongs in the "massive numbers of weak constraints" category? If you could find any way of doing this you could perhaps make the thing as reliable as an Ideal Gas.

So that is Part One of the two part argument: if you can find a motivational system of that sort, you are (almost) home and dry. My initial claim is that this is possible in principle.



Part two of the argument is then about *how* to do that.

To understand how this is feasible, imagine an AI system in which the "concepts" (basic units of knowledge) grow from primitive seed concepts (like Momma, Food, Drink, Warmth, Friends, Play, Curiosity .... ). At the beginning the primitive seeds are coupled together by an equally primitive motivational system that basically hardwires the ways that the system can get pleasure.

What happens after that is that new concepts develop in abundance. Glossing over an immensely complex process, we can note two things::

1) The system is not governed by a strict goal stack, it just improvises the things it "wants" to do within broad parameters defined by the goal system. What this means is that there is an attentional focus: a set of concepts and models that are currently most active and that define what the system is thinking about and trying to do at any given moment. This attentional focus wanders around and occasionally gets pushed in one direction or another by the motivational system. The motivational system is not smart (no knowledge base there) it just pulls things in one direction or another.

2) The way that it pulls things around depends on two different kinds of connections in the system. The concepts are mutually linked because they represent the learned structure of the world. But quite separate from these regular links there is another set that grow from the primitive concepts like branches on a tree: these cross-connect the system, so that every concept is linked back, ultimately, to primitives.


Now, one set of primitives are the primitive motivations and the plans and actions the system can take in order to satisfy those primitives. As time goes on, the sophistication of those plans can become immense, but they still connect back. For example, in childhood, if you were cold you would snuggle up to a warm object, but as an adult you plan for cold situations by getting a warm house (and you plan to earn money to pay for the house and for fuel, etc. etc.). When you look at all the things we know how to do, you find that all of those concepts (plans, schemas, models, etc) are linked back to primitive motivations, through intermediaries.

What this means, in practice, is that in the adult, the motivational system has its fingers in all of the stuff that goes on in the thinking part of the system, because of the historical development of that stuff.

Further, when the system tries to decide what to do, it considers candidate plans and runs them through its internal simulator to see whether they are consistent with its primitive motivational goals (in other words, it thinks about and has a gut feeling about whether a particular proposed behavior is really consistent with its motivational goals. If, at any stage, this modelling turns up any inconsistencies, alarm bells go off and the system is forced to confront the problem (you'll recall the examople I gave originally about a person who wanted to sell their mother for money, and the consequences this had inside their thinking system).

Now, if you look at the overall design of this motivational system you will notice that it does not look at all like a traditional AI goal stack (aka planning system), and that it does look like there are massive numbers of constraints at work.

There are constraints in at least two places: the motivational tree of connections that reach out into the concept space, and the check and balance mechanisms that cause plans to be tested against evolving criteria. Each of these involve such huge numbers of constraints that it is very hard to shake the system away from the mindset that it has, once it is fully grown.

This, I claim, is the basic outline of the way in which stability is achieved in a motivational system.




Finally, there is the question of why such a system would be "friendly" at all. How to define friendly inside the system, in such a way that the system is actually doing things that seem to us to be friendly? After all, it might be utterly stable, but we need stable+(does what we want).

Not easy, but I note that there is an ideal type of human behavior that is of the sort we desire. We can all imagine a "saintlike" human who is extraordinarily fair and reasonable. I believe that that archetype can actually be understood in terms of the balance of the various motivational primitives inside them, and the way they developed. That is where out research efforts should be directed: toward understanding exactly how such a motivational system is structured.

All I can do on that point is to say that I see it as being possible, and I see no arguments being advanced as to why that should be an impossible goal for us to achieve.

At the end of the day, what we could do is to give the system the purest version of that "saintlike" motivational system (NOT a mother-hen, Nanny type, I hasten to add, because I know that some people always interpret it that way), and then accept that, hey, it is going to have trouble with some of the same moral and ethical dilemmas that we have trouble with. But even though it, and we, will have those dilemmas, it will be no more unpredictable and alien in its thoughts than an extremely (1000x) peaceful, benign version of Mahatma Ghandi (and without some of the motivational flaws that even Ghandi possessed).

What I have done is to show a way to get there. A complete solution, in principle, to the Friendliness Problem, in which only the fine details remain. Perhaps there is a showstopper in the fine details. I don't believe so, but if anyone else thinks there is, they need to be specific, and not just say that "it doesn't look like it will work".


Hope that helps.




Richard Loosemore








-----
This list is sponsored by AGIRI: http://www.agiri.org/email
To unsubscribe or change your options, please go to:
http://v2.listbox.com/member/[EMAIL PROTECTED]

Reply via email to