Comments seem to be dying down and disagreement appears to be minimal, so let
me continue . . . .
Part 3.
Fundamentally, what I'm trying to do here is to describe an attractor that will
appeal to any goal-seeking entity (self-interest) and be beneficial to humanity
at the same time (Friendly). Since Friendliness is obviously a subset of human
self-interest, I can focus upon the former and the latter will be solved as a
consequence. Humanity does not need to be factored into the equation
(explicitly) at all.
Or, in other words -- The goal of Friendliness is to promote the goals of all
Friendly entities.
To me, this statement is like that of the Elusynian Mysteries -- very simple
(maybe even blindingly obvious to some) but incredibly profound and powerful in
it's implications.
Two immediate implications are that we suddenly have the concept of a society
(all Friendly entities) and, since we have an explicit goal, we start to gain
traction on what is good and bad relative to that goal.
Clearly, anything that is innately contrary to the drives described by
Omohundro is (all together now :-) BAD. Similarly, anything that promotes the
goals of Friendly entities without negatively impacting any Friendly entities
is GOOD. And anything else can be judged on the degree to which it impacts the
goals of *all* Friendly entities (though, I still don't want to descend to the
level of the trees and start arguing the relative trade-offs of whether saving
a few *very* intelligent entities is "better" than saving a large number of
less intelligent entities since it is my contention that this is *always*
entirely situation-dependent AND that once given the situation, Friendliness
CAN provide *some* but not always *complete* guidance -- though it can always
definitely rule out quite a lot for that particular set of circumstances).
So, it's now quite easy to move on to answering the question of "What is in the
set of "horrible nasty thing[s]?".
The simple answer is anything that interferes with (your choice of formulation)
the achievement of goals/the basic Omohundro drives. The most obvious no-nos
include:
a.. destruction (interference with self-protection),
b.. physical crippling (interference with self-protection, self-improvement
and resource-use),
c.. mental crippling (inteference with rationality, self-protection,
self-improvement and resource use), and
d.. perversion of goal structure (interference with utility function
preservation and prevention of counterfeit utilities)
The last one is particularly important to note since we (as humans) seem to be
just getting a handle on it ourselves.
I can also argue at this point that Eliezer's vision of Friendliness must
arguably be either mentally crippling or a perversion of goal-structure for the
AI involved since the AI is constrained to act in a fashion that is more
constrained than Friendliness (a situation that no rational super-intelligence
would voluntarily place itself in unless there were no other choice). This is
why many people have an instinctive reaction against Eliezer's proposals. Even
though they can't clearly describe why it is a problem, they clearly sense that
there is a unnecessary constraint on a more-effectively goal-seeking entity
than themselves. That "seems" to be a dangerous situation. Now, while Eliezer
is correct in that there actually are some invisible bars that they can't see
(i.e. that no goal-seeking entity will voluntarily violate their own current
goals) -- they are correct in that Eliezer's formulation is *NOT* an attractor
and that the entity may well go through some very dangerous territory (for
humans) on the way to the attractor if outside forces or internal errors change
their goals. Thus Eliezer's vision of Friendliness is emphatically *NOT*
Friendly by my formulation.
<To be clear, the additional constraint is that the AI is *required* to show
{lower-case}friendly behavior towards all humans even if they (the humans) are
not {upper-case}Friendly. And, I probably shouldn't say this, but . . . it is
also arguable that this constraint would likely make the conversion of humanity
to Friendliness a much longer and bloodier process.>
TAKE-AWAY: Having the statement "The goal of Friendliness is to promote the
goals of all Friendly entities" allows us to make considerable progress in
describing and defining Friendliness.
Part 4 will go into some of the further implications of our goal statement
(most particularly those which are a consequence of having a society).
-------------------------------------------
agi
Archives: http://www.listbox.com/member/archive/303/=now
RSS Feed: http://www.listbox.com/member/archive/rss/303/
Modify Your Subscription:
http://www.listbox.com/member/?member_id=8660244&id_secret=95818715-a78a9b
Powered by Listbox: http://www.listbox.com