Bill Hibbard wrote:
> Human and animal brains have mostly selfish values, but
> there is no good reason to design artificial brains with
> selfish values. I'd like to see values based on human
> happiness, as recognized in human faces, voices and body
> language.
> The danger is that reinforcement values will be based
> on some corporation's profits and losses, or even some
> sort of military values.

A lot of these issues have been discussed extensively on the SL4 list (see; all list members who are interested in such things and don't
know the SL4 list, should peruse the list's archives a bit...

However, Bill, your post brings up a specific issue that came up a while ago
on the SL4 list ([EMAIL PROTECTED]) -- "goal drift" or "value drift" -- that is
sufficiently AGI-pertinent that it seems worth discussing here...

Suppose one has an advanced AGI system, which revises and modifies itself
progressively, seeking to make itself more intelligent and/or generally to
cause itself to be more "high-quality" in accordance with its own value

In theory if a system has goal G, and revises itself, it should produce a
revised system that also has goal G.  Unless it has a self-negating goal,
such as "Destroy my current goal system" or a time-dependent goal with an
imminent expiry date.  If the goal G is "be nice to humans" (in some more
rigorous form, presumably) then this should persist thru revisions... a
revision done with the goal of making the system even nicer to humans than
before, should result in a system that still wants to be nice to humans...

But what if it doesn't?  What if iterative self-revision causes the system's
goal G to "drift" over time...

This could have all kinds of effects, from dangerous to wondrous

I suspect this kind of drift is almost inevitable...

Of course, one can seek to architect one's AGI system to mitigate against
goal drift under iterative self-revisions.

But algorithmic information theory comes up again, here.

At some point, a self-revising AGI system, which adds new hardware onto
itself periodically, will achieve a complexity (in the alg. info. theory
sense) greater than that of the human brain.  At this point, one can
formally show, it is *impossible for humans to predict what it will do*.  We
just don't have the compute power in our measly little brains....  So we
certainly can't be sure that goal drift won't occur in a system of
superhuman complexity...

This is an issue to be rethought again & again as AGI gets closer &

-- Ben

To unsubscribe, change your address, or temporarily deactivate your subscription, 
please go to

Reply via email to