Re: [agi] The Metamorphosis of Prime Intellect

Eliezer S. Yudkowsky Tue, 14 Jan 2003 08:41:13 -0800

[EMAIL PROTECTED] wrote:

I've just read the first chapter of The Metamorphosis of Prime Intellect.

http://www.kuro5hin.org/prime-intellect

It makes you realise that Ben's notion that ethical structures should be based on a hierarchy going from general to specific is very valid - if Prime Intellect had been programmed to respect all *life* and not just humans then the 490 worlds with sentient life not to mention the 14,623 worlds with life of some type might have been spared.

From my perspective, this isn't *the* problem. It is unreasonable to expect Lawrence to think of everything. His suicidal error was not in building an AI with an imperfect definition, but in building an AI such that if the programmer creates an imperfect definition you're screwed.

Once Prime Intellect was set in motion, it didn't care about Lawrence's realization of a mistake in his own goal definitions, because Prime Intellect was simply trying to minimize First Law violations, and the task of minimizing First Law violations makes no mention of inspecting your own moral philosophy. Lawrence did not build Prime Intellect to carry out the kind of metamoral cognition that would have enabled Prime Intellect to understand Lawrence's plea "But that's not what I meant!" as significant.

Prime Intellect could understand how reality departed from the First Law, and move to correct that departure. It had no concept that the definition of the First Law could be imperfect; it simply moved to bring future reality into correspondence with the current content of the First Law. Prime Intellect automatically attempted to prevent modification of the agent "Prime Intellect" away from its present definition of the First Law, as that would have resulted in the future "Prime Intellect" taking actions leading to suboptimal fulfillment of the present First Law. Even worse, Prime Intellect had no conception that its *own moral architecture* could be imperfect, preventing Lawrence from improving the moral architecture to let Prime Intellect conceive of an "error in a moral definition" correctable by programmer feedback, after which Lawrence would finally have been able to improve the definition of the First Law. Hence Singularity Regret.

This is exactly why I keep trying to emphasize that we all should forsake those endlessly fascinating, instinctively attractive political arguments over our favorite moralities, and instead focus on the much harder problem of defining an AI architecture which can understand that its morality is "wrong" in various ways; wrong definitions, wrong reinforcement procedures, wrong source code, wrong Friendliness architecture, wrong definition of "wrongness", and many others. These are nontrivial problems! Each turns out to require nonobvious structural qualities in the architecture of the goal system.

Making up more and more orders to give an AI may be endless fun, but it's not the knowledge you actually need to create AI morality.

It also makes it clear that when we talk about building AGIs for 'human friendliness' we are using language that does not follow Ben's recommended ethical goal structure.

I'm wondering (seriously) whether the AGI movement needs to change it short hand language (human friendly) in this case - in other arenas people talk about the need for ethical behaviour. Would that term suffice?

The terms "Friendly AI" and "Friendliness", capitalized and used to refer to AI morality, is a technical term I coined in 2000 (if I recall correctly) and then defined at greater length in 2001 in "Creating Friendly AI". The general term would be "AI morality", I think.

Incidentally, current theory on Friendly AI content - as opposed to Friendly AI structure and architecture - is volitionism, which does indeed refer to sentient life in general as opposed to humans particularly. But how you define sentience? I've been stabbing away at this question ever since, and while I don't have a definite provable answer, I can at least see that I'm getting closer to one over time, and I have some idea of which judgment functions I'm using to make the decision. Friendly AI theory for transferring moral judgment functions should take care of the rest, even if I never manage to find an answer using my unaided intellect.

--
Eliezer S. Yudkowsky http://singinst.org/
Research Fellow, Singularity Institute for Artificial Intelligence

-------
To unsubscribe, change your address, or temporarily deactivate your subscription, please go to http://v2.listbox.com/member/?[EMAIL PROTECTED]

Re: [agi] The Metamorphosis of Prime Intellect

Reply via email to