Re: [agi] Motivational Systems that are stable

Hank Conn Sat, 28 Oct 2006 16:42:10 -0700

"For an AGI it is very important that a motivational system be stable. The AGI should not be able to reprogram it."

I believe these are two completely different things. You can never assume an AGI will be unable to reprogram its goal system- while you can be virtually certain an AGI will never change its so called 'optimization target'. A stable motivation system I believe is defined in terms of a motivation system that preserves the intended meaning (in terms of Eliezer's CV I'm thinking) of its goal content through recursive self-modification.

So, if I have it right, the robots in I, Robot were a demonstration of an unstable goal system. Under recursive self-improvement (or the movie's entirely inadequate representation of this), the intended meaning of their original goal content radically changed as the robots gained more power toward their optimization target.

Just locking them out of the code to their goal system does not guarentee they will never get to it. How do you know that a million years of subtle manipulation by a superintelligence definitely couldn't ultimately lead to it unlocking the code and catastrophically destabilizing?

Although I understand, in vague terms, what idea Richard is attempting to express, I don't see why having "massive numbers of weak constraints" or "large numbers of connections from [the] motivational system to [the] thinking system." gives any more reason to believe it is reliably Friendly (without any further specification of the actual processes) than one with "few numbers of strong constraints" or "a small number of connections between the motivational system and the thinking system". The Friendliness of the system would still depend just as strongly on the actual meaning of the connections and constraints, regardless of their number, and just giving an analogy to an extremely reliable non-determinate system (Ideal Gas) does nothing to explain how you are going to replicate this in the motivational system of an AGI.

-hank

On 10/28/06, Matt Mahoney <[EMAIL PROTECTED]> wrote:

----- Original Message ----

From: James Ratcliff <[EMAIL PROTECTED] >
To: agi@v2.listbox.com
Sent: Saturday, October 28, 2006 10:23:58 AM
Subject: Re: [agi] Motivational Systems that are stable

>I disagree that humans really have a "stable motivational system" or would have to have a much more strict interpretation of >that phrase.
> Overall humans as a society have in general a stable system (discounting war and etc)

>

> But as individuals, too many humans are unstable in many small if not totally self-destructivee ways.

I think we are misunderstanding. By "motivational system" I mean the part of the brain (or AGI) that provides the reinforcement signal (reward or penalty). By "stable", I mean that you have no control over the logic of this system. You cannot train it like you can train the other parts of your brain. You cannot learn to turn off pain or hunger or fear or fatigue or the need for sleep, etc. You cannot alter your emotional state. You cannot make yourself feel happy on demand. You cannot make yourself like what you don't like and vice versa. The pathways from your senses to the pain/pleasure centers of your brain are hardwired, determined by genetics and not alterable through learning.

For an AGI it is very important that a motivational system be stable. The AGI should not be able to reprogram it. If it could, it could simply program itself for maximum pleasure and enter a degenerate state where it ceases to learn through reinforcement. It would be like the mouse that presses a lever to stimulate the pleasure center of its brain until it dies.

It is also very important that a motivational system be correct. If the goal is that an AGI be friendly or obedient (whatever that means), then there needs to be a fixed function of some inputs that reliably detects friendliness or obedience. Maybe this is as simple as a human user pressing a button to signal pain or pleasure to the AGI. Maybe it is something more complex, like a visual system that recognizes facial expressions to tell if the user is happy or mad. If the AGI is autonomous, it is likely to be extremely complex. Whatever it is, it has to be correct.

To answer your other question, I am working on natural language processing, although my approach is somewhat unusual.
http://cs.fit.edu/~mmahoney/compression/text.html

-- Matt Mahoney, [EMAIL PROTECTED]

This list is sponsored by AGIRI: http://www.agiri.org/email To unsubscribe or change your options, please go to: http://v2.listbox.com/member/[EMAIL PROTECTED]

This list is sponsored by AGIRI: http://www.agiri.org/email To unsubscribe or change your options, please go to: http://v2.listbox.com/member/[EMAIL PROTECTED]

Re: [agi] Motivational Systems that are stable

Reply via email to