Richard: > You agree that if we could get such a connection between the > probabilities, > we are home and dry? That we need not care about > "proving" the friendliness > if we can show that the probability is simply > too low to be plausible? Yes, although the probability itself would have to be proven from first principles to be as strong as Friendliness. For any actual system such rigor seems as unlikely as Friendliness itself. > Once the system is > set up to behave according to a diffuse set of checks > and balances (tens > of thousands of ideas about what is "right", rather than > one single > directive), it can never wander far from that set of constraints > without > noticing the departure immediately.> > Would you agree that IF such > a design were feasible, you would not be > able to think of any way to bollix > it? * They have to be the right set of checks and balances, that completely cover ill-defined territory * Nothing unforseen can arise that is not covered by the designed-in checks and balances * The meaning of the constraints has to be applicable to all future developments somehow (e.g. the changing nature of humanity) * The meaning of the constraints and the complex items they operate on has to be immune to drift Given all that, nothing springs immediately into my little mind to disagree with your conclusion. Note that I think this type of approach is an excellent way to try for little-f friendliness, which is probably our best and only option. I like it a lot.
> Date: Mon, 1 Oct 2007 11:34:09 -0400> From: [EMAIL PROTECTED]> To: > [email protected]> Subject: Re: [agi] Religion-free technical content> > > Derek Zahn wrote:> > Richard Loosemore writes:> > > > > You must remember > that the complexity is not a massive part of the> > > system, just a > small-but-indispensible part.> > >> > > I think this sometimes causes > confusion: did you think that I meant> > > that the whole thing would be so > opaque that I could not understand> > > *anything* about the behavior of the > system? Like, all the> > > characteristics of the system would be one huge > emergent property, with> > > us having no idea about where the intelligence > came from?> > > > No, certainly not. I think the confusion here involves the > distinction > > between Friendliness with a capital F (meaning a formal > theory of what > > the term means and an intelligent system built to provably > maintain that > > property in the mathematical, not verbal, sense), and > friendliness with > > a lower case f, which relies on more human types of > reasoning.> > Derek,> > Your post raises several issues that I will try to > get to in due course, > but I want to deal with one of them quickly (if I > can).> > I am attacking the very notion that there really is something that > is > mathematical Friendliness with a capital F, which can be proved formally > > rather than (something else).> > I am also stating that while this mythical > provable-friendliness does > not really exist (i.e. it will never be > possible), there is something > else that gives us exactly what we want, but > is not a mathematical proof.> > Here is why. According to quantum mechanics > there is a finite, non-zero > probability that the Sun could suddenly > quantum-tunnel itself to a new > position inside the perfume department of > Bloomingdales.> > There is no formal proof that it will not do this. There is > no > possibility of such a formal proof.> > But we accept that we do not need > to worry about this happening because > we have an idea of what the > probability is. In essence, we know that > for the Sun to do that, each atom > in it would have to do the same thing > all at once, and since the > probability of each individual event is so > small, and since they are all > multiplied, the overall probability is > stupidly small> > Now, of course I > exaggerate for comedy, but the fact is that if you can > make the event "An > AGI reneges on the motivations designed into it" > dependent on a very large > number of improbable events all happening at > once, then you can multipl the > probabilities and come to a situation > where the overall probability is > vanishingly small.> > You agree that if we could get such a connection > between the > probabilities, we are home and dry? That we need not care about > > "proving" the friendliness if we can show that the probability is simply > > too low to be plausible?> > Right, now consider the nature of the design I > propose: the > motivational system never has an opportunity for a point > failure: > everything that happens is multiply-constrained (and on a massive > scale: > far more than is the case even in our own brains). Once the system > is > set up to behave according to a diffuse set of checks and balances (tens > > of thousands of ideas about what is "right", rather than one single > > directive), it can never wander far from that set of constraints without > > noticing the departure immediately.> > Would you agree that IF such a design > were feasible, you would not be > able to think of any way to bollix it?> > > Let's pause the discussion there: I want to know if you can see any > > problems within the assumptions I have laid down.> > > > > Richard > Loosemore.> > > > > > > > > -----> This list is sponsored by AGIRI: > http://www.agiri.org/email> To unsubscribe or change your options, please go > to:> http://v2.listbox.com/member/?& ----- This list is sponsored by AGIRI: http://www.agiri.org/email To unsubscribe or change your options, please go to: http://v2.listbox.com/member/?member_id=8660244&id_secret=48453119-dce330
