Matt,
Is there any particular reason why you're being so obnoxious?
His proposal said *nothing* of the sort and your sarcasm has buried any
value your post might have had.
----- Original Message -----
From: "Matt Mahoney" <[EMAIL PROTECTED]>
To: <[email protected]>
Sent: Monday, October 01, 2007 12:57 PM
Subject: **SPAM** Re: [agi] Religion-free technical content
Richard,
Let me make sure I understand your proposal. You propose to program
friendliness into the motivational structure of the AGI as tens of
thousands
of hand-coded soft constraints or rules. Presumably with so many rules,
we
should be able to cover every conceivable situation now or in the future
where
the AGI would have to make a moral decision. Among these rules: the AGI
is
not allowed to modify the function that computes its reward signal, nor it
is
allowed to create another AGI with a different function.
You argue that the reward function becomes more stable after RSI. I
presume
this is because when there are a large number of AGIs, they will be able
to
observe any deviant behavior, then make a collective decision as to
whether
the deviant should be left alone, reprogrammed, or killed. This policing
would be included in the reward function.
Presumably the reward function is designed by a committee of upstanding
citizens who have reached a consensus on what it means to be friendly in
every
possible scenario. Once designed, it can never be changed. Because if
there
were any mechanism by which all of the AGIs could be updated at once, then
there is a single point of failure. This is not allowed. On the other
hand,
if the AGIs were updated one at a time (allowed only with human
permission),
then the resulting deviant behavior would be noticed by the other AGIs
before
they could be updated. So the reward function remains fixed.
Is this correct?
--- Richard Loosemore <[EMAIL PROTECTED]> wrote:
Matt Mahoney wrote:
> --- Richard Loosemore <[EMAIL PROTECTED]> wrote:
>
>> Derek Zahn wrote:
>>> Richard Loosemore writes:
>>>
>>> > It is much less opaque.
>>> >
>>> > I have argued that this is the ONLY way that I know of to ensure
>>> that
>>> > AGI is done in a way that allows safety/friendliness to be
guaranteed.
>>> >
>>> > I will have more to say about that tomorrow, when I hope to make
>>> an
>>> > announcement.
>>>
>>> Cool. I'm sure I'm not the only one eager to see how you can
>>> guarantee
>>> (read: prove) such specific detailed things about the behaviors of a
>>> complex system.
>> Hmmm... do I detect some skepticism? ;-)
>
> I remain skeptical. Your argument applies to an AGI not modifying its
> own
> motivational system. It does not apply to an AGI making modified
> copies
of
> itself. In fact you say:
Not correct, I am afraid: I specifically emphasize that the AGI is
allowed to modify its own motivational system. I don't know how you got
the opposite idea. (I haven't had time to review my text, so apologies
if it was my fault and I did accidentally give the wrong impression ....
but the whole point of this essay was to suggest a way to gurantee
friendliness under any circumstances, including self-improvement).
>> Also, during the development of the first true AI, we would monitor
>> the
>> connections going from motivational system to thinking system. It
>> would
>> be easy to set up alarm bells if certain kinds of thoughts started to
>> take hold -- just do it by associating with certain keys sets of
>> concepts and keywords. While we are designing a stable motivational
>> system, we can watch exactly what goes on, and keep tweeking until it
>> gets to a point where it is clearly not going to get out of the large
>> potential well.
I do not see how this illustrates your point above.
> You refer to the humans building the first AGI. Humans, being
> imperfect,
> might not get the algorithm for friendliness exactly right in the first
> iteration. So it will be up to the AGI to tweak the second copy a
> little
more
> (according to the first AGI's interpretation of friendliness). And so
> on.
So
> the goal drifts a little with each iteration. And we have no control
> over
> which way it drifts.
What an extraordinary statement to make!
The purpose of the essay was to argue that with each iteration it digs
itself deeper into the same pattern and cannot drift out into an
unfriendly state.
But you reply to this by just stating that the opposite is going to be
the case, without saying why. Which part of my argument did you decide
was wrong, that you could state the opposite conclusion?
Richard Loosemore
-----
This list is sponsored by AGIRI: http://www.agiri.org/email
To unsubscribe or change your options, please go to:
http://v2.listbox.com/member/?&
-- Matt Mahoney, [EMAIL PROTECTED]
-----
This list is sponsored by AGIRI: http://www.agiri.org/email
To unsubscribe or change your options, please go to:
http://v2.listbox.com/member/?&
-----
This list is sponsored by AGIRI: http://www.agiri.org/email
To unsubscribe or change your options, please go to:
http://v2.listbox.com/member/?member_id=8660244&id_secret=48488310-dd26a4