--- On Wed, 8/27/08, Vladimir Nesov <[EMAIL PROTECTED]> wrote:
> One of the main motivations for the fast development of
> Friendly AI is
> that it can be allowed to develop superintelligence to
> police the
> human space from global catastrophes like Unfriendly AI,
> which
> includes as a special case a hacked design of Friendly AI
> made
> Unfriendly.

That is certainly the most compelling reason to do this kind of research. And I 
wish I had something more than "disallow self-modifying approaches", as if that 
would be enforcible. But I just don't see Friendliness as attainable, in 
principle, so I think we treat this like nuclear weaponry - we do our best to 
prevent it.
 
> If we can understand it and know that it does what we want,
> we don't
> need to limit its power, because it becomes our power. 

Whose power?  Who is referred to by "our"?  More importantly, whose agenda is 
served by this power? Power corrupts. One culture's good is another's evil. 
What we call Friendly, our political enemies might call Unfriendly. If you 
think no agenda would be served, you're naive. And if you think the AGI would 
somehow know to not serve its masters in service to Friendliness to humanity, 
then you believe in an objective morality... in a universally compelling 
argument.

> With simulated
> intelligence, understanding might prove as difficult as in
> neuroscience, studying resulting design that is unstable
> and thus in
> long term Unfriendly. Hacking it to a point of Friendliness
> would be
> equivalent to solving the original question of
> Friendliness,
> understanding what you want, and would in fact involve
> something close
> to hands-on design, so it's unclear how much help
> experiments can
> provide in this regard relative to default approach.

Agreed, although I would not advocate hacking Friendliness. I'd advocate 
limiting the simulated environment in which the agent exists. The point of this 
line of reasoning is to avoid the Singularity, period. Perhaps that's every bit 
as unrealistic as I believe Friendliness to be.
 
> It's self-improvement, not self-retardation. If
> modification is
> expected to make you unstable and crazy, don't do that
> modification,
> add some redundancy instead and think again.

The question is whether its possible to know in advance that an modification 
won't be unstable, within the finite computational resources available to an 
AGI. With the kind of recursive scenarios we're talking about, simulation is 
the only way to guarantee that a modification is an improvement, and an AGI 
simulating its own modified operation requires exponentially increasing 
resources, particularly as it simulates itself simulating itself simulating 
itself, and so on for N future modifications.

> > What does it compare *against*?
> 
> Originally, it "compares" against humans, later
> on it improves on the
> information about the initial conditions, renormalizing the
> concept
> against itself.

For it to compare against humans suggests that it's possible for humans to 
specify Friendliness to an AGI, and I have dealt with that elsewhere. 

I was expecting you to say that renormalizing continues to occur *against 
humans*, not itself. How would it account for the possibility that what humans 
consider Friendly changes through time? 
 
Terren


      


-------------------------------------------
agi
Archives: https://www.listbox.com/member/archive/303/=now
RSS Feed: https://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=8660244&id_secret=111637683-c8fa51
Powered by Listbox: http://www.listbox.com

Reply via email to