--- On Wed, 8/27/08, Vladimir Nesov <[EMAIL PROTECTED]> wrote:
> One of the main motivations for the fast development of
> Friendly AI is
> that it can be allowed to develop superintelligence to
> police the
> human space from global catastrophes like Unfriendly AI,
> which
> includes as a special case a hacked design of Friendly AI
> made
> Unfriendly.
That is certainly the most compelling reason to do this kind of research. And I
wish I had something more than "disallow self-modifying approaches", as if that
would be enforcible. But I just don't see Friendliness as attainable, in
principle, so I think we treat this like nuclear weaponry - we do our best to
prevent it.
> If we can understand it and know that it does what we want,
> we don't
> need to limit its power, because it becomes our power.
Whose power? Who is referred to by "our"? More importantly, whose agenda is
served by this power? Power corrupts. One culture's good is another's evil.
What we call Friendly, our political enemies might call Unfriendly. If you
think no agenda would be served, you're naive. And if you think the AGI would
somehow know to not serve its masters in service to Friendliness to humanity,
then you believe in an objective morality... in a universally compelling
argument.
> With simulated
> intelligence, understanding might prove as difficult as in
> neuroscience, studying resulting design that is unstable
> and thus in
> long term Unfriendly. Hacking it to a point of Friendliness
> would be
> equivalent to solving the original question of
> Friendliness,
> understanding what you want, and would in fact involve
> something close
> to hands-on design, so it's unclear how much help
> experiments can
> provide in this regard relative to default approach.
Agreed, although I would not advocate hacking Friendliness. I'd advocate
limiting the simulated environment in which the agent exists. The point of this
line of reasoning is to avoid the Singularity, period. Perhaps that's every bit
as unrealistic as I believe Friendliness to be.
> It's self-improvement, not self-retardation. If
> modification is
> expected to make you unstable and crazy, don't do that
> modification,
> add some redundancy instead and think again.
The question is whether its possible to know in advance that an modification
won't be unstable, within the finite computational resources available to an
AGI. With the kind of recursive scenarios we're talking about, simulation is
the only way to guarantee that a modification is an improvement, and an AGI
simulating its own modified operation requires exponentially increasing
resources, particularly as it simulates itself simulating itself simulating
itself, and so on for N future modifications.
> > What does it compare *against*?
>
> Originally, it "compares" against humans, later
> on it improves on the
> information about the initial conditions, renormalizing the
> concept
> against itself.
For it to compare against humans suggests that it's possible for humans to
specify Friendliness to an AGI, and I have dealt with that elsewhere.
I was expecting you to say that renormalizing continues to occur *against
humans*, not itself. How would it account for the possibility that what humans
consider Friendly changes through time?
Terren
-------------------------------------------
agi
Archives: https://www.listbox.com/member/archive/303/=now
RSS Feed: https://www.listbox.com/member/archive/rss/303/
Modify Your Subscription:
https://www.listbox.com/member/?member_id=8660244&id_secret=111637683-c8fa51
Powered by Listbox: http://www.listbox.com