Comments seem to be dying down and disagreement appears to be minimal, so let 
me continue . . . . 

Part 3.

Fundamentally, what I'm trying to do here is to describe an attractor that will 
appeal to any goal-seeking entity (self-interest) and be beneficial to humanity 
at the same time (Friendly).  Since Friendliness is obviously a subset of human 
self-interest, I can focus upon the former and the latter will be solved as a 
consequence.  Humanity does not need to be factored into the equation 
(explicitly) at all.

Or, in other words -- The goal of Friendliness is to promote the goals of all 
Friendly entities.

To me, this statement is like that of the Elusynian Mysteries -- very simple 
(maybe even blindingly obvious to some) but incredibly profound and powerful in 
it's implications.

Two immediate implications are that we suddenly have the concept of a society 
(all Friendly entities) and, since we have an explicit goal, we start to gain 
traction on what is good and bad relative to that goal.

Clearly, anything that is innately contrary to the drives described by 
Omohundro is (all together now :-) BAD.  Similarly, anything that promotes the 
goals of Friendly entities without negatively impacting any Friendly entities 
is GOOD.  And anything else can be judged on the degree to which it impacts the 
goals of *all* Friendly entities (though, I still don't want to descend to the 
level of the trees and start arguing the relative trade-offs of whether saving 
a few *very* intelligent entities is "better" than saving a large number of 
less intelligent entities since it is my contention that this is *always* 
entirely situation-dependent AND that once given the situation, Friendliness 
CAN provide *some* but not always *complete* guidance -- though it can always 
definitely rule out quite a lot for that particular set of circumstances).

So, it's now quite easy to move on to answering the question of "What is in the 
set of "horrible nasty thing[s]?".

The simple answer is anything that interferes with (your choice of formulation) 
the achievement of goals/the basic Omohundro drives.  The most obvious no-nos 
include:
  a.. destruction (interference with self-protection),
  b.. physical crippling (interference with self-protection, self-improvement 
and resource-use),
  c.. mental crippling (inteference with rationality, self-protection, 
self-improvement and resource use), and 
  d.. perversion of goal structure (interference with utility function 
preservation and prevention of counterfeit utilities)
The last one is particularly important to note since we (as humans) seem to be 
just getting a handle on it ourselves.

I can also argue at this point that Eliezer's vision of Friendliness must 
arguably be either mentally crippling or a perversion of goal-structure for the 
AI involved since the AI is constrained to act in a fashion that is more 
constrained than Friendliness (a situation that no rational super-intelligence 
would voluntarily place itself in unless there were no other choice).  This is 
why many people have an instinctive reaction against Eliezer's proposals.  Even 
though they can't clearly describe why it is a problem, they clearly sense that 
there is a unnecessary constraint on a more-effectively goal-seeking entity 
than themselves.  That "seems" to be a dangerous situation.  Now, while Eliezer 
is correct in that there actually are some invisible bars that they can't see 
(i.e. that no goal-seeking entity will voluntarily violate their own current 
goals) -- they are correct in that Eliezer's formulation is *NOT* an attractor 
and that the entity may well go through some very dangerous territory (for 
humans) on the way to the attractor if outside forces or internal errors change 
their goals.  Thus Eliezer's vision of Friendliness is emphatically *NOT* 
Friendly by my formulation.

<To be clear, the additional constraint is that the AI is *required* to show 
{lower-case}friendly behavior towards all humans even if they (the humans) are 
not {upper-case}Friendly.  And, I probably shouldn't say this, but . . . it is 
also arguable that this constraint would likely make the conversion of humanity 
to Friendliness a much longer and bloodier process.>

TAKE-AWAY:  Having the statement "The goal of Friendliness is to promote the 
goals of all Friendly entities" allows us to make considerable progress in 
describing and defining Friendliness.

Part 4 will go into some of the further implications of our goal statement 
(most particularly those which are a consequence of having a society).

-------------------------------------------
agi
Archives: http://www.listbox.com/member/archive/303/=now
RSS Feed: http://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: 
http://www.listbox.com/member/?member_id=8660244&id_secret=95818715-a78a9b
Powered by Listbox: http://www.listbox.com

Reply via email to