Agreed, Tim, no sandbox environment can be sufficient for determining benevolence.
Such an environment can only be a heuristic guide. We will gather data about an AGI's benevolence from its behavior in the sandbox, and from our knowledge of its internal state. And we will make our best judgment about the AGI's benevolence. No guarantees. I have seen no alternative suggestion, in extensive discussions on the SL4 list. Eliezer has suggested that it is possible to create an AGI system with a special Friendliness-friendly goal architecture, one that that makes "Friendliness to humans" a more probable outcome than it would be with Novamente's more flexible goal architecture. I have read his ideas carefully and remain unconvinced. -- Ben > -----Original Message----- > From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]On > Behalf Of Tim Barnard > Sent: Thursday, January 10, 2002 1:06 PM > To: [EMAIL PROTECTED] > Subject: Re: [agi] Friendliness toward humans > > > > 3) an intention to implement a careful "AGI sandbox" that we > won't release > > our AGI from until we're convinced it is genuinely benevolent > > > > > > -- Ben > > Unfortunately, what one says and what one's intent is can be two > completely > different things. It's unlikely, to my mind, that the "sandbox" > restriction > will be a sufficient environment for determining benevolence. > What would be > sufficient?. Will we require the AGI to take on some kind of physical form > with which it can demonstrate to us over some period of time its > benevolent > characteristics? Hard to say. > > Tim > > > ------- > To unsubscribe, change your address, or temporarily deactivate > your subscription, > please go to http://v2.listbox.com/member/?[EMAIL PROTECTED] > ------- To unsubscribe, change your address, or temporarily deactivate your subscription, please go to http://v2.listbox.com/member/?[EMAIL PROTECTED]
