On 03/07/2008 03:20 PM,, Mark Waser wrote:
> For there to be another attractor F', it would of necessity have to be
> an attractor that is not desirable to us, since you said there is only
> one stable attractor for us that has the desired characteristics.
Uh, no. I am not claiming that there is */ONLY/* one unique attractor
(that has the desired characteristics). I am merely saying that there
is */AT LEAST/* one describable, reachable, stable attractor that has
the characteristics that we want. (Note: I've clarified a previous
statement my adding the */ONLY/* and */AT LEAST /*and the
parenthetical expression "that has the desired characteristics".)
Okay, got it now. At least one, not exactly one.
I really don't like the particular quantifier "rather minimal". I
would argue (and will later attempt to prove) that the constraints are
still actually as close to Friendly as rationally possible because
that is the most rational way to move non-Friendlies to a Friendly
status (which is a major Friendliness goal that I'll be getting to
shortly). The Friendly will indeed "have no qualms about kicking ass
and inflicting pain */where necessary/*" but the where necessary
clause is critically important since a Friendly shouldn't resort to
this (even for Unfriendlies) until it is truly necessary.
Fair enough. "rather minimal" is much too strong a phrase.
> I think you're fudging a bit here. If we are only likely to occupy the
> circumstance space with probability less than 1, then the intentional
> destruction of the human race is not 'most certainly ruled out': it is
> with very high probability less than 1 ruled out. I'm not trying to say
> it's likely; only that's it's possible. */I make this point to
distinguish
> your approach from other approaches that purport to make absolute
> guarantees about certain things (as in some ethical systems where
> certain things are *always* wrong, regardless of context or
circumstance)./*
Um. I think that we're in violent agreement. I'm not quite sure
where you think I'm fudging.
The reason I thought you were fudging was that I thought you were saying
that it is absolutely certain that the AI will never turn the planet
into computronium and upload us *AND* there are no absolute guarantees.
I guess I was misled when I read "given the circumstance space that we
are likely to occupy with a huge certainty, the intentional destruction
of the human race is most certainly ruled out" as meaning 'turning earth
into computronium is certainly ruled out'. It's only certainly ruled out
*assuming* the highly likely area of circumstance space that we are
likely to inhabit. So yeah, I guess we do agree.
This raises another point for me though. In another post (2008-03-06
14:36) you said:
"It would *NOT* be Friendly if I have a goal that I not be turned into
computronium even if <your clause> (which I hereby state that I do)"
Yet, if I understand our recent exchange correctly, it is possible for
this to occur and be a Friendly action regardless of what sub-goals I
may or may have. (It's just extremely unlikely given ..., which is an
important distinction.) It would be nice to have some ballpark
probability estimates though to know what we mean by extremely unlikely.
10E-6 is a very different beast than 10E-1000.
> I don't think it's inflammatory or a case of garbage in to contemplate
> that all of humanity could be wrong. For much of our history, there
have
> been things that *every single human was wrong about*. This is merely
> the assertion that we can't make guarantees about what vastly superior
> f-beings will find to be the case. We may one day outgrow our
attachment
> to meatspace, and we may be wrong in our belief that everything
> essential can be preserved in meatspace, but we might not be at that
> point yet when the AI has to make the decision.
Why would the AI *have* to make the decision? It shouldn't be for
it's own convenience. The only circumstance that I could think of
where the AI should make such a decision *for us* over our
objections is if we would be destroyed otherwise (but there was no way
for it to convince us of this fact before the destruction was inevitable).
It might not *have* to. I'm only saying it's possible. And it would
almost certainly be for some circumstance that has not occurred to us,
so I can't give you a specific scenario. Not being able to find such a
scenario is different though from there not actually being one. In order
to believe the later, a proof is required.
> Yes, when you talk about Friendliness as that distant attractor, it
> starts to sound an awful lot like "enlightenment", where self-interest
> is one aspect of that enlightenment, and friendly behavior is another
> aspect.
Argh! I would argue that Friendliness is *not* that distant. Can't
you see how the attractor that I'm describing is both self-interest
and Friendly because **ultimately they are the same thing** (OK, so
maybe that *IS* enlightenment :-)
Well, I was thinking of the region of state space close to the attractor
as being a sort of "approaching perfection" region in terms of certain
desirable qualities and capabilities, and I don't think we're really
close to that. Having said that, I'm by temperament a pessimist and a
skeptic, but I would go along with "heading in the right direction".
-joseph
-------------------------------------------
agi
Archives: http://www.listbox.com/member/archive/303/=now
RSS Feed: http://www.listbox.com/member/archive/rss/303/
Modify Your Subscription:
http://www.listbox.com/member/?member_id=8660244&id_secret=95818715-a78a9b
Powered by Listbox: http://www.listbox.com