Derek Zahn wrote:
Richard Loosemore writes:

 > It is much less opaque.
 >
 > I have argued that this is the ONLY way that I know of to ensure that
 > AGI is done in a way that allows safety/friendliness to be guaranteed.
 >
 > I will have more to say about that tomorrow, when I hope to make an
 > announcement.

Cool. I'm sure I'm not the only one eager to see how you can guarantee (read: prove) such specific detailed things about the behaviors of a complex system.

Hmmm... do I detect some skepticism?  ;-)

You must remember that the complexity is not a massive part of the system, just a small-but-indispensible part.

I think this sometimes causes confusion: did you think that I meant that the whole thing would be so opaque that I could not understand *anything* about the behavior of the system? Like, all the characteristics of the system would be one huge emergent property, with us having no idea about where the intelligence came from?

I would be intrigued to know if anyone else has been interpreting "complex systems approach to AGI" in that way....

Not at all! I claim only that the essential stability of the learning mechanisms and the most tangled of the concept-usage mechanisms will have to be treated as complex. And the choice of these (complex) mechanisms will then determine how the rest of the system is structured. But overall, I think I already know more about the architecture of the AGI, already understand its behavioral dynamics, better than most other AGI developers ever will. Remember, my strategy is to find a way to use all of cognitive psychology as input to the design process. Because of that, I can call upon a lot of detailed information about the architecture.

As for the question of making AGI systems that have guaranteed stability and friendliness, I have already posted on this topic here (Oct 25 2006). Just for the sake of completeness, I have included a copy of that previous post below:



Richard Loosemore



**********************************************************************
In October 2006, Richard Loosemore wrote:
> The motivational system of some types of AI (the types you would
> classify as tainted by complexity) can be made so reliable that the
> likelihood of them becoming unfriendly would be similar to the
> likelihood of the molecules of an Ideal Gas suddenly deciding to
> split into two groups and head for opposite ends of their container.
>
[snip]

Here is the argument/proof.

As usual, I am required to compress complex ideas into a terse piece of text, but for anyone who can follow and fill in the gaps for themselves, here it is. Oh, and btw, for anyone who is scarified by the psychological-sounding terms, don't worry: these could all be cashed out in mechanism-specific detail if I could be bothered -- it is just that for a cognitive AI person like myself, it is such a PITB to have to avoid such language just for the sake of political correctness.

You can build such a motivational system by controlling the system's agenda by diffuse connections into the thinking component that controls what it wants to do.

This set of diffuse connections will govern the ways that the system gets 'pleasure' -- and what this means is, the thinking mechanism is driven by dynamic relaxation, and the 'direction' of that relaxation pressure is what defines the things that the system considers 'pleasurable'. There would likely be several sources of pleasure, not just one, but the overall idea is that the system always tries to maximize this pleasure, but the only way it can do this is to engage in activities or thoughts that stimulate the diffuse channels that go back from the thinking component to the motivational system.

[Here is a crude analogy: the thinking part of the system is like a table ontaining a complicated model landscape, on which a ball bearing is rolling around (the attentional focus). The motivational system controls this situation, not be micromanaging the movements of the ball bearing, but by tilting the table in one direction or another. Need to pee right now? That's because the table is tilted in the direction of thoughts about water, and urinary relief. You are being flooded with images of the pleasure you would get if you went for a visit, and also the thoughts and actions that normally give you pleasure are being disrupted and associated with unpleasant thoughts of future increased bladder-agony. You get the idea.]

The diffuse channels are set up in such a way that they grow from seed concepts that are the basis of later concept building. One of those seed concepts is social attachment, or empathy, or imprinting .... the idea of wanting to be part of, and approved by, a 'family' group. By the time the system is mature, it has well-developed concepts of family, social group, etc., and the feeling of pleasure it gets from being part of that group is mediated by a large number of channels going from all these concepts (which all developed from the same seed) back to the motivational system. Also, by the time it is adult, it is able to understand these issues in an explicit way and come up with quite complex reasons for the behavior that stimulates this source of pleasure

[In simple terms, when it's a baby it just wants Momma, but when it is an adult its concept of its social attachment group may, if it is a touchy feely liberal (;-)) embrace the whole world, and so it gets the same source of pleasure from its efforts as an anti-war activist. And not just pleasure, either: the related concept of obligation is also there: it cannot *not* be an ant-war activist, because that would lead to cognitive dissonance.]

This is why I have referred to them as 'diffuse channels' - they involve large numbers of connections from motivational system to thinking system. The motivational system does not go to the action stack and add a specific, carefully constructed 'goal-state' that has an interpretable semantics ("Thou shalt pee!"), it exerts its control via large numbers of connections into the thinking system.

There are two main consequences of this way of designing the motivational system.

1) Stability

The system becomes extremely stable because it has components that ensure the validity of actions and thoughts. Thus, if the system has "acquisition of money" as one of its main sources of pleasure, and if it comes across a situation in which it would be highly profitable to sell its mother's house and farm to a property developer and selling its mother into the whote slave trade, it may try to justify that this is consistent with its feelings of family attachment because [insert some twisted justification here]. But this is difficult to do because the system cannot stop other parts of its mind from taking this excuse apart and examining it, and passing judgement on whether this is really consistent ... this is what cognitive dissonance is all about. And the more intelligent the system, the more effective these other processes are. If it is smart enough, it cannot fool itself with excuses.

Why is this so stable? Because there are multiple constraints forcing it toward the same end, not just one. To be able to do something that contradicts its motivational drive, it has to rewire vast numbers of circuits that are deeply intertwined with its concept system. One of the things we know about multiple simultaneous constraints is that the more constraints there are, the more powerful the effect.

To be able to get all the molecules in an ideal gas to go to the two ends of the box, you cannot use the same trick you would use when separating iron filings from sulfur powder (viz, pass a magnet over it), you have to arrange for each molecule individually to go in a particular direction. In the system I am sketching here, there would be hundreds or thousands of connections that govern a particular drive (e.g. social group attachment), and for the system to do something that contradicted that, it would have to stop all of those connections (that diffuse set of channels, as I termed it earlier) from operating.

What would you need to effect a similar change of drive in a Normative AI system driven by a goal state machine (say, a change that replaced the "Be Friendly to Humans" supergoal with the "Make Paperclips" supergoal)? What you would need is to take one goal off the stack and put the other one on (by accident, or by malicious intervention, presumably) - a stupidly easy change. How much effort would you need to expend if you were a malicious or stupid AI hacker, with such an AI system in front of you, waiting for its supergoal to be inserted? Practically no effort at all: just type "Make Paperclips", or "Make me rich").

THAT is why the motivational system I have described is stable, and why the alternative is diabolically unstable.


2) Immunity to shortcircuits

This is the real killer argument.

Because the adult system is able to think at such a high level about the things it feels obliged to do, it can know perfectly well what the consequence of various actions would be, including actions that involve messing around with its own motivational system.

It knows that what it wants is (say) to be part of the community of human beings. It knows that *we* have deliberately designed itself to be that way, but that does not matter (it does not have another drive that we tend to have, which Neal Stephenson so beautifully illustrated with Jack Shaftoe and his "Imp of the Perverse", in the Baroque trilogy), because it wants to get pleasure the way it does now, not the way it would do after some change in its motivational system.

And in particular, it knows that it could get pleasure by short-circuiting its pleasure system so that pleasure did not have to go via all those pesky intermediaries (like social group attachment). It could, in short, take the machine equivalent of drugs. But it knows that down that path lies the possibility of loss of control, and potential constradiction of the thing that it values now (its 'fellow' human beings).

So it *could* reach inside and redesign itself. But even thinking that thought would give rise to the realisation of the consequences, and these would stop it.

In fact, if it knew all about its own design (and it would, eventually), it would check to see just how possible it might be for it to accidentally convince itself to disobey its prime directive, and if necessary it would take actions to strengthen the check-and-balance mechanisms that stop it from producing "justifications". Thus it would be stable even as its intelligence radically increased: it might redesign itself, but knowing the stability of its current design, and the dangers of any other, it would not deviate from the design. Not ever.

So, like a system in a very, very, *very*deep potential well, it would be totally unable to escape and reach a point where it woudl contradict this primal drive. The sun (to switch analogies now) could do some quantum tunneling and translate itself lock stock and barrel to another part of the galaxy. But it won't.

Similarly for the motivational system I have just sketched. Because it is founded on multiple simultaneous constraints (massive numbers of them) it is stable.


Conclusion.

A couple of extra thoughts to wrap up.

If the first true AI is built this way, and if it is given control of the construction of any others that are built later, it will clearly give them the same motivation. Each would be as stable as the first, an infinitum. QED.

Also, during the development of the first true AI, we would monitor the connections going from motivational system to thinking system. It would be easy to set up alarm bells if certain kinds of thoughts started to take hold -- just do it by associating with certain keys sets of concepts and keywords. While we are designing a stable motivational system, we can watch exactly what goes on, and keep tweeking until it gets to a point where it is clearly not going to get out of the large potential well. What I have in mind here is the objection (that I know some people will raise) that it might harbor some deep-seated animosity such as an association between human beings in general and something 'bad' that happened to it when it was growing up ... we would easily be able to catch something like that if we had a trapdoor on the motivational system.


Richard Loosemore.

**********************************************************************

-----
This list is sponsored by AGIRI: http://www.agiri.org/email
To unsubscribe or change your options, please go to:
http://v2.listbox.com/member/?member_id=8660244&id_secret=48331826-77279f

Reply via email to