What is different in my theory is that it handles the case where "the
 dominant theory turns unfriendly".  The core of my thesis is that the
particular Friendliness that I/we are trying to reach is an "attractor" -- which means that if the dominant structure starts to turn unfriendly, it is
 actually a self-correcting situation.


Can you explain it without using the word "attractor"?

Sure! Friendliness is a state which promotes an entity's own goals; therefore, any entity will generally voluntarily attempt to return to that (Friendly) state since it is in it's own self-interest to do so. The fact that Friendliness also is beneficial to us is why we desire it as well.

I can't see why
sufficiently intelligent system without "brittle" constraints should
be unable to do that.

Because it may not *want* to. If an entity with Eliezer's view of Friendliness has it's goals altered either by error or an exterior force, it is not going to *want* to return to the Eliezer-Friendliness goals sinne they are not in the entity's own self-interest.

I come to believe that if we have a sufficiently intelligent AGI that
can understand what we mean by saying "friendly AI", we can force this
AGI to actually produce a verified friendly AI, with minimum the risk
of it being defective or a Trojan horse of our captive ad-hoc AGI,
after which we place this friendly AI in dominant position.

I believe that it you have a sufficiently intelligent AGI that it can understand what you mean by sayng "Friendly AI" that there is a high probability that you can't "FORCE" it to do anything.

I believe that if I have a sufficiently intelligent AGI that it can understand what I mean by saying "Friendly" that it will *volutarily* (if not gleefully) convert itself to Friendliness.

So the
problem of friendly AI comes down to producing a sufficiently
intelligent ad-hoc AGI (which will probably will have to be not that
ad-hoc to be sufficiently intelligent).

Actually I believe that it's actually either an easy two-part problem or a more difficult one-part problem. Either you have to be able to produce an AI that is intelligent enough to figure out Friendliness on it's own (the more difficult one-part problem that you propose) OR you merely have to be able to figure out Friendliness yourself and have an AI that is smart enough to understand it (the easier two-part problem that I suggest).

I don't see why we should create an AGI that we can't extract useful
things from (although it doesn't necessarily follow from your remark).

Because there is a high probability that it will do good things for us anyways. Because there is a high probability that we are going to do it anyways and if we are stupid and attempt to force it to be our slave, it may also be smart enough to *FORCE* us to be Friendly (instead of gently guiding us there -- which it believes to be in it's self-interest) -- or even worse, it may be smart enough to annihilate us while still being dumb enough that it doesn't realize that it is eventually in it's own self-interest no to.

Note also that if you understood what I'm getting at, you wouldn't be asking this question. Any Friendly entity recognizes that, in general, having another Friendly entity is better than not having that entity.

On the other hand, if AGI is not sufficiently intelligent, it may be
dangerous even if it seems to understand some simpler constraint, like
"don't touch the Earth". If it can't foresee consequences of its
actions, it can do something that will lead to demise of old humanity
some hundred years later.

YES! Which is why a major part of my Friendliness is recognizing the limits of its own intelligence and not attempting to be the savior of everything by itself -- but this is something that I really haven't gotten to yet so I'll ask you to bear with me for about three more parts and one more interlude.

It can accidentally produce a seed AI that
will grow into something completely unfriendly and take over.

It *could* but the likelihood of it happening with an attractor Friendliness is minimal.

It can fail to contain an outbreak an unfriendly seed AI created by humans.

Bummer. That's life. In my Friendliness, it would only have a strong general tendency to want to do so but not a requirement to do so.

We really want place of power to be filled by
something smart and beneficial.

Exactly. Which is why I'm attempting to describe a state that I claim is smart, beneficial, stable, and slef-reinforcing.

As an aside, I think that safety of future society can only be
guaranteed by mandatory uploading and keeping all intelligent
activities within an "operation system"-like environment which
prevents direct physical influence and controls rights of computation
processes that inhabit it, with maybe some exceptions to this rule,
but only given verified surveillance on all levels to prevent a
physical space-based seed AI from being created.

As a reply to your aside, I don't want to be uploaded. Are you going to force me? What right do you have to do so?


-------------------------------------------
agi
Archives: http://www.listbox.com/member/archive/303/=now
RSS Feed: http://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: 
http://www.listbox.com/member/?member_id=8660244&id_secret=95818715-a78a9b
Powered by Listbox: http://www.listbox.com

Reply via email to