Re: [agi] Religion-free technical content

Richard Loosemore Mon, 01 Oct 2007 10:41:42 -0700

Matt Mahoney wrote:

Richard,
Let me make sure I understand your proposal.  You propose to program
friendliness into the motivational structure of the AGI as tens of thousands
of hand-coded soft constraints or rules.  Presumably with so many rules, we
should be able to cover every conceivable situation now or in the future where
the AGI would have to make a moral decision.  Among these rules: the AGI is
not allowed to modify the function that computes its reward signal, nor it is
allowed to create another AGI with a different function.


You argue that the reward function becomes more stable after RSI.  I presume
this is because when there are a large number of AGIs, they will be able to
observe any deviant behavior, then make a collective decision as to whether
the deviant should be left alone, reprogrammed, or killed.  This policing
would be included in the reward function.

Presumably the reward function is designed by a committee of upstanding
citizens who have reached a consensus on what it means to be friendly in every
possible scenario.  Once designed, it can never be changed.  Because if there
were any mechanism by which all of the AGIs could be updated at once, then
there is a single point of failure.  This is not allowed.  On the other hand,
if the AGIs were updated one at a time (allowed only with human permission),
then the resulting deviant behavior would be noticed by the other AGIs before
they could be updated.  So the reward function remains fixed.

Is this correct?

Well, I am going to assume that Mark is wrong and that you are nottrying to be sarcastic, but really do genuinely mean to pose the questions.

You have misunderstood the design at a very deep level, so none oftheabove would happen.

The multiple constraints are not explicitly programmed into the systemin the form of semantically interpretable statements (like Asimov'slaws), nor would there be a simple "reward function", nor would there bea committe of experts who sat down and tried to write out a completelist of all the rules. These are all old-AI concepts (conventional,non-complex AI), they simply do not map onto the system at all.

The AGI has a motivational system that *biasses* the cloud of conceptsin one direction or another, to make the system have certain goals, andthe nature of this bias is that during development, the conceptsthemselves all grew from simple primitive ideas (so primitive that theyare not even ideas, but just sources of influence on the conceptbuilding process), and these simple primitives reach out through theentire web of adult concepts.

This is a difficult idea to grasp, I admit, but the consequence of thattype of system design is that, for example, the general idea of "feelingempathy for the needs and aspirations of the entire human race" is notrepresented in the system as an explcit memory location that says "Rulenumber 71, as decided by the Committee of World AGI Ethics Experts, isthat you must feel empathy for the entire human race" .... instead, thething that we externally describe as "empathy" is just a collectiveresult of a massive number of learned concepts and their connections.

This makes "empathy" a _systemic_ characteristic, intrinsic to theentire system, not a localizable rule.

The empathy feeling, to be sure, is controlled by roots that go back tothe motivational system, but these roots would be built in such a waythat tampering or malfunction would:

(a) not be able to happen without huge intervention, which would beeasily noticed, and

(b) not cause any catastrophic behavior even if it did go wrong, becausethe malfunctioning of the motivational system would render the entiresystem useless.

Notice that in a real human system, damage to the empathy component canpossibly cause trouble, but that is precisely because we have other,dangerous components like our aggression modules, which can take over.These would not be present, so an AGI would degrade gracefully if theempathy system (for some bizarre reason) were interfered with.

And to asnwer you general question: the empathy function would not beconstrained to be fixed, because it would be dependent on the wishes ofhumanity. Or rather, the *nature* of the empathy function would staythe same, but the content (the expression of the empathy) would staylocked in to the desires of humanity, in perpetuity.



Hope that answers the questions.



Richard Loosemore

--- Richard Loosemore <[EMAIL PROTECTED]> wrote:
Matt Mahoney wrote:
--- Richard Loosemore <[EMAIL PROTECTED]> wrote:
Derek Zahn wrote:
Richard Loosemore writes:

 > It is much less opaque.
 >
 > I have argued that this is the ONLY way that I know of to ensure that
 > AGI is done in a way that allows safety/friendliness to be
guaranteed.
 >
 > I will have more to say about that tomorrow, when I hope to make an
 > announcement.
Cool. I'm sure I'm not the only one eager to see how you can guarantee(read: prove) such specific detailed things about the behaviors of acomplex system.
Hmmm... do I detect some skepticism?  ;-)
I remain skeptical.  Your argument applies to an AGI not modifying its own
motivational system.  It does not apply to an AGI making modified copies
of
itself.  In fact you say:
Not correct, I am afraid: I specifically emphasize that the AGI isallowed to modify its own motivational system. I don't know how you gotthe opposite idea. (I haven't had time to review my text, so apologiesif it was my fault and I did accidentally give the wrong impression ....but the whole point of this essay was to suggest a way to guranteefriendliness under any circumstances, including self-improvement).
Also, during the development of the first true AI, we would monitor theconnections going from motivational system to thinking system. It wouldbe easy to set up alarm bells if certain kinds of thoughts started totake hold -- just do it by associating with certain keys sets ofconcepts and keywords. While we are designing a stable motivationalsystem, we can watch exactly what goes on, and keep tweeking until itgets to a point where it is clearly not going to get out of the largepotential well.
I do not see how this illustrates your point above.
You refer to the humans building the first AGI.  Humans, being imperfect,
might not get the algorithm for friendliness exactly right in the first
iteration.  So it will be up to the AGI to tweak the second copy a little
more
(according to the first AGI's interpretation of friendliness).  And so on.
 So
the goal drifts a little with each iteration.  And we have no control over
which way it drifts.
What an extraordinary statement to make!
The purpose of the essay was to argue that with each iteration it digsitself deeper into the same pattern and cannot drift out into anunfriendly state.
But you reply to this by just stating that the opposite is going to bethe case, without saying why. Which part of my argument did you decidewas wrong, that you could state the opposite conclusion?
Richard Loosemore




-----
This list is sponsored by AGIRI: http://www.agiri.org/email
To unsubscribe or change your options, please go to:
http://v2.listbox.com/member/?&;
-- Matt Mahoney, [EMAIL PROTECTED]

-----
This list is sponsored by AGIRI: http://www.agiri.org/email
To unsubscribe or change your options, please go to:
http://v2.listbox.com/member/?&;


-----
This list is sponsored by AGIRI: http://www.agiri.org/email
To unsubscribe or change your options, please go to:
http://v2.listbox.com/member/?member_id=8660244&id_secret=48493498-42effb

Re: [agi] Religion-free technical content

Reply via email to