On 03/07/2008 03:20 PM,, Mark Waser wrote:
> For there to be another attractor F', it would of necessity have to be
> an attractor that is not desirable to us, since you said there is only
> one stable attractor for us that has the desired characteristics.
Uh, no. I am not claiming that there is */ONLY/* one unique attractor (that has the desired characteristics). I am merely saying that there is */AT LEAST/* one describable, reachable, stable attractor that has the characteristics that we want. (Note: I've clarified a previous statement my adding the */ONLY/* and */AT LEAST /*and the parenthetical expression "that has the desired characteristics".)

Okay, got it now. At least one, not exactly one.

I really don't like the particular quantifier "rather minimal". I would argue (and will later attempt to prove) that the constraints are still actually as close to Friendly as rationally possible because that is the most rational way to move non-Friendlies to a Friendly status (which is a major Friendliness goal that I'll be getting to shortly). The Friendly will indeed "have no qualms about kicking ass and inflicting pain */where necessary/*" but the where necessary clause is critically important since a Friendly shouldn't resort to this (even for Unfriendlies) until it is truly necessary.

Fair enough. "rather minimal" is much too strong a phrase.
> I think you're fudging a bit here. If we are only likely to occupy the
> circumstance space with probability less than 1, then the intentional
> destruction of the human race is not 'most certainly ruled out': it is
> with very high probability less than 1 ruled out. I'm not trying to say
> it's likely; only that's it's possible. */I make this point to distinguish
> your approach from other approaches that purport to make absolute
> guarantees about certain things (as in some ethical systems where
> certain things are *always* wrong, regardless of context or circumstance)./* Um. I think that we're in violent agreement. I'm not quite sure where you think I'm fudging.

The reason I thought you were fudging was that I thought you were saying that it is absolutely certain that the AI will never turn the planet into computronium and upload us *AND* there are no absolute guarantees. I guess I was misled when I read "given the circumstance space that we are likely to occupy with a huge certainty, the intentional destruction of the human race is most certainly ruled out" as meaning 'turning earth into computronium is certainly ruled out'. It's only certainly ruled out *assuming* the highly likely area of circumstance space that we are likely to inhabit. So yeah, I guess we do agree.

This raises another point for me though. In another post (2008-03-06 14:36) you said:

"It would *NOT* be Friendly if I have a goal that I not be turned into computronium even if <your clause> (which I hereby state that I do)"

Yet, if I understand our recent exchange correctly, it is possible for this to occur and be a Friendly action regardless of what sub-goals I may or may have. (It's just extremely unlikely given ..., which is an important distinction.) It would be nice to have some ballpark probability estimates though to know what we mean by extremely unlikely. 10E-6 is a very different beast than 10E-1000.


> I don't think it's inflammatory or a case of garbage in to contemplate > that all of humanity could be wrong. For much of our history, there have
> been things that *every single human was wrong about*. This is merely
> the assertion that we can't make guarantees about what vastly superior
> f-beings will find to be the case. We may one day outgrow our attachment
> to meatspace, and we may be wrong in our belief that everything
> essential can be preserved in meatspace, but we might not be at that
> point yet when the AI has to make the decision.
Why would the AI *have* to make the decision? It shouldn't be for it's own convenience. The only circumstance that I could think of where the AI should make such a decision *for us* over our objections is if we would be destroyed otherwise (but there was no way for it to convince us of this fact before the destruction was inevitable).
It might not *have* to. I'm only saying it's possible. And it would almost certainly be for some circumstance that has not occurred to us, so I can't give you a specific scenario. Not being able to find such a scenario is different though from there not actually being one. In order to believe the later, a proof is required.
> Yes, when you talk about Friendliness as that distant attractor, it
> starts to sound an awful lot like "enlightenment", where self-interest
> is one aspect of that enlightenment, and friendly behavior is another
> aspect.
Argh! I would argue that Friendliness is *not* that distant. Can't you see how the attractor that I'm describing is both self-interest and Friendly because **ultimately they are the same thing** (OK, so maybe that *IS* enlightenment :-)
Well, I was thinking of the region of state space close to the attractor as being a sort of "approaching perfection" region in terms of certain desirable qualities and capabilities, and I don't think we're really close to that. Having said that, I'm by temperament a pessimist and a skeptic, but I would go along with "heading in the right direction".

-joseph

-------------------------------------------
agi
Archives: http://www.listbox.com/member/archive/303/=now
RSS Feed: http://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: 
http://www.listbox.com/member/?member_id=8660244&id_secret=95818715-a78a9b
Powered by Listbox: http://www.listbox.com

Reply via email to