Hi,

> 2)  If you get the deep theory wrong, there is a strong possibility of a
> silent catastrophic failure: the AI appears to be learning
> everything just
> fine, and both you and the AI are apparently making all kinds of
> fascinating discoveries about AI morality, and everything seems to be
> going pretty much like your intuitions predict above, but when the AI
> crosses the cognitive threshold of superintelligence it takes actions
> which wipe out the human species as a side effect.
>
> AIXI, which is a completely defined formal system, definitely undergoes a
> failure of exactly this type.

*Definitely*, huh?  I don't really believe you...

I can see the direction your thoughts are going in....

Supppose you're rewarding AIXI for acting as though it's a Friendly AI.

Then, by searching the space of all possible programs, it finds some
program P that causes it to act as though it's a Friendly AI, satisfying
humans thoroughly in this regard.

There's an issue that a lot of different programs P could fulfill this
criterion.

Among these are programs P that will cause AIXI to fool humans into thinking
it's Friendly, until such a point as AIXI has acquired enough physical power
to annihilate all humans -- and which, at that point, will cause AIXI to
annihilate all humans.

But I can't see why you think AIXI would be particularly likely to come up
with programs P of this nature.

Instead, my understanding is that AIXI is going to have a bias to come up
with the most compact program P that maximizes reward.

And I think it's unlikely that the most compact program P for "impressing
humans with Friendliness" is one that involves "acting Friendly for a while,
then annihilating humanity."

You could argue that the system would maximize its long-term reward by
annihilating humanity, because after pesky humans are gone, it can simply
reward itself unto eternity without caring what we think.

But, if it's powerful enough to annihilate us, it's also probably powerful
enough to launch itself into space and reward itself unto eternity without
caring what we think, all by itself (an Honest Annie type scenario).  Why
would it prefer "annihilate humans" P to "launch myself into space" P?

But anyway, it seems to me that the way AIXI works is to maximize expected
reward assuming that its reward function continues pretty much as it has
in the past.  So AIXI is not going to choose programs P based on a desire
to bring about futures in which it can masturbatively maximize its own
rewards.  At least, that's my understanding, though I could be wrong.

This whole type of scenario is avoided by limitations on computational
resources, because I believe that "impressing humans regarding Friendliness
by actually being Friendly" is a simpler computational problem than
"impressing humans regarding Friendliness by subtly emulating Friendliness
but really concealing murderous intentions."  Also, I'd note that in a
Novamente, one could most likely distinguish these two scenarios by looking
inside the system and studying the Atoms and maps therein.

Jeez, all this talk about the future of AGI really makes me want to stop
e-mailing and dig into the damn codebase and push Novamente a little closer
to being a really autonomous intelligence instead of a partially-complete
codebase with some narrow-AI applications !!! ;-p

-- Ben G



-------
To unsubscribe, change your address, or temporarily deactivate your subscription, 
please go to http://v2.listbox.com/member/?[EMAIL PROTECTED]

Reply via email to