Derek Zahn wrote:
Richard Loosemore writes:

 > You must remember that the complexity is not a massive part of the
 > system, just a small-but-indispensible part.
 >
 > I think this sometimes causes confusion: did you think that I meant
 > that the whole thing would be so opaque that I could not understand
 > *anything* about the behavior of the system? Like, all the
 > characteristics of the system would be one huge emergent property, with
 > us having no idea about where the intelligence came from?
No, certainly not. I think the confusion here involves the distinction between Friendliness with a capital F (meaning a formal theory of what the term means and an intelligent system built to provably maintain that property in the mathematical, not verbal, sense), and friendliness with a lower case f, which relies on more human types of reasoning.

Derek,

Your post raises several issues that I will try to get to in due course, but I want to deal with one of them quickly (if I can).

I am attacking the very notion that there really is something that is mathematical Friendliness with a capital F, which can be proved formally rather than (something else).

I am also stating that while this mythical provable-friendliness does not really exist (i.e. it will never be possible), there is something else that gives us exactly what we want, but is not a mathematical proof.

Here is why. According to quantum mechanics there is a finite, non-zero probability that the Sun could suddenly quantum-tunnel itself to a new position inside the perfume department of Bloomingdales.

There is no formal proof that it will not do this. There is no possibility of such a formal proof.

But we accept that we do not need to worry about this happening because we have an idea of what the probability is. In essence, we know that for the Sun to do that, each atom in it would have to do the same thing all at once, and since the probability of each individual event is so small, and since they are all multiplied, the overall probability is stupidly small

Now, of course I exaggerate for comedy, but the fact is that if you can make the event "An AGI reneges on the motivations designed into it" dependent on a very large number of improbable events all happening at once, then you can multipl the probabilities and come to a situation where the overall probability is vanishingly small.

You agree that if we could get such a connection between the probabilities, we are home and dry? That we need not care about "proving" the friendliness if we can show that the probability is simply too low to be plausible?

Right, now consider the nature of the design I propose: the motivational system never has an opportunity for a point failure: everything that happens is multiply-constrained (and on a massive scale: far more than is the case even in our own brains). Once the system is set up to behave according to a diffuse set of checks and balances (tens of thousands of ideas about what is "right", rather than one single directive), it can never wander far from that set of constraints without noticing the departure immediately.

Would you agree that IF such a design were feasible, you would not be able to think of any way to bollix it?

Let's pause the discussion there: I want to know if you can see any problems within the assumptions I have laid down.




Richard Loosemore.








-----
This list is sponsored by AGIRI: http://www.agiri.org/email
To unsubscribe or change your options, please go to:
http://v2.listbox.com/member/?member_id=8660244&id_secret=48442747-6d3c9d

Reply via email to