Re: [agi] Religion-free technical content

Richard Loosemore Sun, 30 Sep 2007 17:18:56 -0700

Derek Zahn wrote:

Richard Loosemore writes:
 > It is much less opaque.
 >
 > I have argued that this is the ONLY way that I know of to ensure that
 > AGI is done in a way that allows safety/friendliness to be guaranteed.
 >
 > I will have more to say about that tomorrow, when I hope to make an
 > announcement.
Cool. I'm sure I'm not the only one eager to see how you can guarantee(read: prove) such specific detailed things about the behaviors of acomplex system.


Hmmm... do I detect some skepticism?  ;-)

You must remember that the complexity is not a massive part of thesystem, just a small-but-indispensible part.

I think this sometimes causes confusion: did you think that I meantthat the whole thing would be so opaque that I could not understand*anything* about the behavior of the system? Like, all thecharacteristics of the system would be one huge emergent property, withus having no idea about where the intelligence came from?

I would be intrigued to know if anyone else has been interpreting"complex systems approach to AGI" in that way....

Not at all! I claim only that the essential stability of the learningmechanisms and the most tangled of the concept-usage mechanisms willhave to be treated as complex. And the choice of these (complex)mechanisms will then determine how the rest of the system is structured.But overall, I think I already know more about the architecture of theAGI, already understand its behavioral dynamics, better than most otherAGI developers ever will. Remember, my strategy is to find a way to useall of cognitive psychology as input to the design process. Because ofthat, I can call upon a lot of detailed information about the architecture.

As for the question of making AGI systems that have guaranteed stabilityand friendliness, I have already posted on this topic here (Oct 252006). Just for the sake of completeness, I have included a copy ofthat previous post below:




Richard Loosemore



**********************************************************************
In October 2006, Richard Loosemore wrote:
> The motivational system of some types of AI (the types you would
> classify as tainted by complexity) can be made so reliable that the
> likelihood of them becoming unfriendly would be similar to the
> likelihood of the molecules of an Ideal Gas suddenly deciding to
> split into two groups and head for opposite ends of their container.
>
[snip]

Here is the argument/proof.

As usual, I am required to compress complex ideas into a terse piece oftext, but for anyone who can follow and fill in the gaps for themselves,here it is. Oh, and btw, for anyone who is scarified by thepsychological-sounding terms, don't worry: these could all be cashedout in mechanism-specific detail if I could be bothered -- it is justthat for a cognitive AI person like myself, it is such a PITB to have toavoid such language just for the sake of political correctness.

You can build such a motivational system by controlling the system'sagenda by diffuse connections into the thinking component that controlswhat it wants to do.

This set of diffuse connections will govern the ways that the systemgets 'pleasure' -- and what this means is, the thinking mechanism isdriven by dynamic relaxation, and the 'direction' of that relaxationpressure is what defines the things that the system considers'pleasurable'. There would likely be several sources of pleasure, notjust one, but the overall idea is that the system always tries tomaximize this pleasure, but the only way it can do this is to engage inactivities or thoughts that stimulate the diffuse channels that go backfrom the thinking component to the motivational system.

[Here is a crude analogy: the thinking part of the system is like atable ontaining a complicated model landscape, on which a ball bearingis rolling around (the attentional focus). The motivational systemcontrols this situation, not be micromanaging the movements of the ballbearing, but by tilting the table in one direction or another. Need topee right now? That's because the table is tilted in the direction ofthoughts about water, and urinary relief. You are being flooded withimages of the pleasure you would get if you went for a visit, and alsothe thoughts and actions that normally give you pleasure are beingdisrupted and associated with unpleasant thoughts of future increasedbladder-agony. You get the idea.]

The diffuse channels are set up in such a way that they grow from seedconcepts that are the basis of later concept building. One of thoseseed concepts is social attachment, or empathy, or imprinting .... theidea of wanting to be part of, and approved by, a 'family' group. Bythe time the system is mature, it has well-developed concepts of family,social group, etc., and the feeling of pleasure it gets from being partof that group is mediated by a large number of channels going from allthese concepts (which all developed from the same seed) back to themotivational system. Also, by the time it is adult, it is able tounderstand these issues in an explicit way and come up with quitecomplex reasons for the behavior that stimulates this source of pleasure

[In simple terms, when it's a baby it just wants Momma, but when it isan adult its concept of its social attachment group may, if it is atouchy feely liberal (;-)) embrace the whole world, and so it gets thesame source of pleasure from its efforts as an anti-war activist. Andnot just pleasure, either: the related concept of obligation is alsothere: it cannot *not* be an ant-war activist, because that would leadto cognitive dissonance.]

This is why I have referred to them as 'diffuse channels' - they involvelarge numbers of connections from motivational system to thinkingsystem. The motivational system does not go to the action stack and adda specific, carefully constructed 'goal-state' that has an interpretablesemantics ("Thou shalt pee!"), it exerts its control via large numbersof connections into the thinking system.

There are two main consequences of this way of designing themotivational system.


1) Stability

The system becomes extremely stable because it has components thatensure the validity of actions and thoughts. Thus, if the system has"acquisition of money" as one of its main sources of pleasure, and if itcomes across a situation in which it would be highly profitable to sellits mother's house and farm to a property developer and selling itsmother into the whote slave trade, it may try to justify that this isconsistent with its feelings of family attachment because [insert sometwisted justification here]. But this is difficult to do because thesystem cannot stop other parts of its mind from taking this excuse apartand examining it, and passing judgement on whether this is reallyconsistent ... this is what cognitive dissonance is all about. And themore intelligent the system, the more effective these other processesare. If it is smart enough, it cannot fool itself with excuses.

Why is this so stable? Because there are multiple constraints forcingit toward the same end, not just one. To be able to do something thatcontradicts its motivational drive, it has to rewire vast numbers ofcircuits that are deeply intertwined with its concept system. One ofthe things we know about multiple simultaneous constraints is that themore constraints there are, the more powerful the effect.

To be able to get all the molecules in an ideal gas to go to the twoends of the box, you cannot use the same trick you would use whenseparating iron filings from sulfur powder (viz, pass a magnet over it),you have to arrange for each molecule individually to go in a particulardirection. In the system I am sketching here, there would be hundredsor thousands of connections that govern a particular drive (e.g. socialgroup attachment), and for the system to do something that contradictedthat, it would have to stop all of those connections (that diffuse setof channels, as I termed it earlier) from operating.

What would you need to effect a similar change of drive in a NormativeAI system driven by a goal state machine (say, a change that replacedthe "Be Friendly to Humans" supergoal with the "Make Paperclips"supergoal)? What you would need is to take one goal off the stack andput the other one on (by accident, or by malicious intervention,presumably) - a stupidly easy change. How much effort would you need toexpend if you were a malicious or stupid AI hacker, with such an AIsystem in front of you, waiting for its supergoal to be inserted?Practically no effort at all: just type "Make Paperclips", or "Make merich").

THAT is why the motivational system I have described is stable, and whythe alternative is diabolically unstable.



2) Immunity to shortcircuits

This is the real killer argument.

Because the adult system is able to think at such a high level about thethings it feels obliged to do, it can know perfectly well what theconsequence of various actions would be, including actions that involvemessing around with its own motivational system.

It knows that what it wants is (say) to be part of the community ofhuman beings. It knows that *we* have deliberately designed itself tobe that way, but that does not matter (it does not have another drivethat we tend to have, which Neal Stephenson so beautifully illustratedwith Jack Shaftoe and his "Imp of the Perverse", in the Baroquetrilogy), because it wants to get pleasure the way it does now, not theway it would do after some change in its motivational system.

And in particular, it knows that it could get pleasure byshort-circuiting its pleasure system so that pleasure did not have to govia all those pesky intermediaries (like social group attachment). Itcould, in short, take the machine equivalent of drugs. But it knowsthat down that path lies the possibility of loss of control, andpotential constradiction of the thing that it values now (its 'fellow'human beings).

So it *could* reach inside and redesign itself. But even thinking thatthought would give rise to the realisation of the consequences, andthese would stop it.

In fact, if it knew all about its own design (and it would, eventually),it would check to see just how possible it might be for it toaccidentally convince itself to disobey its prime directive, and ifnecessary it would take actions to strengthen the check-and-balancemechanisms that stop it from producing "justifications". Thus it wouldbe stable even as its intelligence radically increased: it mightredesign itself, but knowing the stability of its current design, andthe dangers of any other, it would not deviate from the design. Not ever.

So, like a system in a very, very, *very*deep potential well, it wouldbe totally unable to escape and reach a point where it woudl contradictthis primal drive. The sun (to switch analogies now) could do somequantum tunneling and translate itself lock stock and barrel to anotherpart of the galaxy. But it won't.

Similarly for the motivational system I have just sketched. Because itis founded on multiple simultaneous constraints (massive numbers ofthem) it is stable.



Conclusion.

A couple of extra thoughts to wrap up.

If the first true AI is built this way, and if it is given control ofthe construction of any others that are built later, it will clearlygive them the same motivation. Each would be as stable as the first, aninfinitum. QED.

Also, during the development of the first true AI, we would monitor theconnections going from motivational system to thinking system. It wouldbe easy to set up alarm bells if certain kinds of thoughts started totake hold -- just do it by associating with certain keys sets ofconcepts and keywords. While we are designing a stable motivationalsystem, we can watch exactly what goes on, and keep tweeking until itgets to a point where it is clearly not going to get out of the largepotential well. What I have in mind here is the objection (that I knowsome people will raise) that it might harbor some deep-seated animositysuch as an association between human beings in general and something'bad' that happened to it when it was growing up ... we would easily beable to catch something like that if we had a trapdoor on themotivational system.



Richard Loosemore.

**********************************************************************

-----
This list is sponsored by AGIRI: http://www.agiri.org/email
To unsubscribe or change your options, please go to:
http://v2.listbox.com/member/?member_id=8660244&id_secret=48331826-77279f

Re: [agi] Religion-free technical content

Reply via email to