On 03/07/2008 08:09 AM,, Mark Waser wrote:
There is one unique attractor in state space.
No. I am not claiming that there is one unique attractor. I am
merely saying that there is one describable, reachable, stable
attractor that has the characteristics that we want. There are
*clearly* other attractors. For starters, my attractor requires
sufficient intelligence to recognize it's benefits. There is
certainly another very powerful attractor for simpler, brute force
approaches (which frequently have long-term disastrous consequences
that aren't seen or are ignored).
Of course. An earlier version said "there is one unique attractor that
<identify friendliness here>", and while editing it somehow ended up in
that obviously wrong form.
Since any sufficiently advanced species will eventually be drawn
towards F, the CEV of all species is F.
While I believe this to be true, I am not convinced that it is
necessary for my argument. I think that it would make my argument a
lot easier if I could prove it to be true -- but I currently don't see
a way to do that. Anyone want to chime in here?
Ah, okay. I thought you were going to argue this following on from
Omohundro's paper about drives common to all sufficiently advanced AIs
and extend it to all sufficiently advanced intelligences, but that's my
hallucination.
Therefore F is not species-specific, and has nothing to do with any
particular species or the characteristics of the first species that
develops an AGI (AI).
I believe that the F that I am proposing is not species-specific. My
problem is that there may be another attractor F' existing somewhere
far off in state space that some other species might start out close
enough to that it would be pulled into that attractor instead. In
that case, there would be the question as to how the species in the
two different attractors interact. My belief is that it would be to
the mutual benefit of both but I am not able to prove that at this time.
For there to be another attractor F', it would of necessity have to be
an attractor that is not desirable to us, since you said there is only
one stable attractor for us that has the desired characteristics. I
don't see how beings subject to these two different attractors would
find mutual benefit in general, since if they did, then F' would have
the desirable characteristics that we wish a stable attractor to have,
but it doesn't.
This means that genuine conflict between friendly species or between
friendly individuals is not even possible, so there is no question of
an AI needing to arbitrate between the conflicting interests of two
friendly individuals or groups of individuals. Of course, there will
still be conflicts between non-friendlies, and the AI may arbitrate
and/or intervene.
Wherever/whenever there is a shortage of resources (i.e. not all goals
can be satisfied), goals will conflict. Friendliness describes the
behavior that should result when such conflicts arise. Friendly
entities should not need arbitration or intervention but should
welcome help in determining the optimal solution (which is *close* to
arbitration but subtly different in that it is not adverserial). I
would rephrase your general point as a true, adverserial relationship
is not even possible.
That's a better way of putting it. Conflict will be possible, but
they'll always be resolved via exchange of information rather than bullets.
The AI will not be empathetic towards homo sapiens sapiens in
particular. It will be empathetic towards f-beings (friendly beings
in the technical sense), whether they exist or not (since the AI
might be the only being anywhere near the attractor).
Yes. It will also be empathic towards beings with the potential to
become f-beings because f-beings are a tremendous resource/benefit.
You've said elsewhere that the constraints on how it deals with
non-friendlies are rather minimal, so while it might be
empathic/empathetc, it will still have no qualms about kicking ass and
inflicting pain where necessary.
This means no specific acts of the AI towards any species or
individuals are ruled out, since it might be part of their CEV (which
is the CEV of all beings), even though they are not smart enough to
realize it.
Absolutely correct and dead wrong at the same time. You could invent
specific incredibly low-probabaility but possible circumstances where
*any* specific act is justified. I'm afraid that my vision of
Friendliness certainly does permit the intentional destruction of the
human race if that is the *only* way to preserve a hundred more
intelligent, more advanced, more populous races. On the other hand,
given the circumstance space that we are likely to occupy with a huge
certainty, the intentional destruction of the human race is most
certainly ruled out. Or, in other words, there are no infinite
guarantees but we can reduce the dangers to infinitessimally small levels.
I think you're fudging a bit here. If we are only likely to occupy the
circumstance space with probability less than 1, then the intentional
destruction of the human race is not 'most certainly ruled out': it is
with very high probability less than 1 ruled out. I'm not trying to say
it's likely; only that's it's possible. I make this point to distinguish
your approach from other approaches that purport to make absolute
guarantees about certain things (as in some ethical systems where
certain things are *always* wrong, regardless of context or circumstance).
Since the AI empathizes not with humanity but with f-beings in
general, it is possible (likely) that some of humanity's most
fundamental beliefs may be wrong from the perspective of an f-being.
Absolutely. Jihad is fundamentally wrong from the perspective of an
f-being. A jihadist is *not* an f-being. It's actions are entirely
contrary to the tenets of Friendly action.
And we are not yet f-beings in general, since our current location in
state space is so far from F. Or do you believe that some (many?) of us
are close to F?
Without getting into the debate of the merits of virtual-space versus
meat-space and uploading, etc., it seems to follow that *if* the view
that everything of importance is preserved (no arguments about this,
it is an assumption for the sake of this point only) in virtual-space
and *if* turning the Earth into computronium and uploading humanity
and all of Earth's beings would be vastly more efficient a use of the
planet, *then* the AI should do this (perhaps would be morally
obligated to do this) -- even if every human being pleads for this
not to occur. The AI would have judged that if we were only smarter,
faster, more the kind of people we would like to be, etc., we would
actually prefer the computronium scenario.
The weak point of this argument lies in the phrase "the AI would have
judged that if <any clause>, we would actually prefer <any clause>".
Extrapolation is a tremendously error-prone process and what the AI is
attempting to do here *absolutely requires* that it has a better
knowledge of YOUR goals than you do for this to be a Friendly act. We
justifiably do this all the time when we do unpleasant things for our
child's health. But, the intelligent parent (or Friendly entity) does
not do such things without a really high probability that they are
correct.
Note: I realize that this is going to be a point of much
unhappiness/contention/debate and there will be endless arguments as
to exactly where the line is. This is all well and good but I hope
that we don't lose the forest for the trees (this is why I'm not doing
math at this point). This specific case ends up with an inflammatory
conclusion because it starts out by ASSUMING an equally inflammatory
premise (i.e. that all human beings are incorrect about their goals).
I would argue that this is simply a case of garbage in, garbage out.
I don't think it's inflammatory or a case of garbage in to contemplate
that all of humanity could be wrong. For much of our history, there have
been things that *every single human was wrong about*. This is merely
the assertion that we can't make guarantees about what vastly superior
f-beings will find to be the case. We may one day outgrow our attachment
to meatspace, and we may be wrong in our belief that everything
essential can be preserved in meatspace, but we might not be at that
point yet when the AI has to make the decision.
It's become apparent to me in thinking about this that 'friendliness'
is really not a good term for the attitude of an f-being that we are
talking about: that of acting solely in the interest of f-beings
(whether others exist or not) and in consistency with the CEV of all
sufficiently ... beings. It is really just acting rationally
(according to a system that we do not understand and may vehemently
disagree with).
Actually, I would argue that Friendliness is a good term because that
is the net result to us if we are Friendly; however, a possibly better
term is simply "enlightened self-interest" since that describes why an
f-being would want to act that way (i.e. why Friendliness is an
attractor).
Yes, when you talk about Friendliness as that distant attractor, it
starts to sound an awful lot like "enlightenment", where self-interest
is one aspect of that enlightenment, and friendly behavior is another
aspect.
:-) I haven't addressed this question yet but the short answer is
that there is no requirement for intervention (for a variety of
reasons that I haven't established on this forum the necessary
groundwork to easily explain).
Looking forward to it. Thanks for the detailed response.
joseph
-------------------------------------------
agi
Archives: http://www.listbox.com/member/archive/303/=now
RSS Feed: http://www.listbox.com/member/archive/rss/303/
Modify Your Subscription:
http://www.listbox.com/member/?member_id=8660244&id_secret=95818715-a78a9b
Powered by Listbox: http://www.listbox.com