Derek Zahn wrote:
Richard:
> You agree that if we could get such a connection between the
 > probabilities, we are home and dry? That we need not care about
 > "proving" the friendliness if we can show that the probability is simply
 > too low to be plausible?

Yes, although the probability itself would have to be proven from first principles to be as strong as Friendliness. For any actual system such rigor seems as unlikely as Friendliness itself.

Oh, I think that is too strong a reservation: you have to know the design to be sure about how the probabilities would be calculated, and I am far from describing the design and the calculations at the moment.

I would argue that if the system is built in such a way that everything in it must happen as a result of multiple constraint satisfaction (all the processes are governed by it, so there are no weak spots where something could come in and take it over), then all events are subject to the same bullet-proof resistance to tampering, and the implications of that, for the calculation of probabilities, is that the behavior of the system becomes more like a statistical mechanics problem: you can treat the probability of certain kinds of events as determined by simple, uniform factors, and do the math on them.

(Same argument is then used for all the lower level hardware layers, by the way: the AGI itself would help to design underlying hardware that involves distributed constraint satisfaction right down as far as possible).

So you see the goal is to map the architecture onto a class of statistical mechanics problem, then do the math from there. IF that mapping is possible, then the calculation becomes relatively trivial.

I especially cannot agree that this is in the same class as "proving" friendliness with a capital F .... we cannot even begin to get ANY idea about how to do that! There is a world of difference between where Yudkowsky's idea of "capital-F" friendliness has gotten to, and the proposal I have outlined here: I have given a strategy for mapping the problem onto a known class of problems, I think.




 > Once the system is
 > set up to behave according to a diffuse set of checks and balances (tens
 > of thousands of ideas about what is "right", rather than one single
 > directive), it can never wander far from that set of constraints without
 > noticing the departure immediately.
 >
 > Would you agree that IF such a design were feasible, you would not be
 > able to think of any way to bollix it?

* They have to be the right set of checks and balances, that completely cover ill-defined territory * Nothing unforseen can arise that is not covered by the designed-in checks and balances * The meaning of the constraints has to be applicable to all future developments somehow (e.g. the changing nature of humanity) * The meaning of the constraints and the complex items they operate on has to be immune to drift Given all that, nothing springs immediately into my little mind to disagree with your conclusion. Note that I think this type of approach is an excellent way to try for little-f friendliness, which is probably our best and only option. I like it a lot.

Okay, this is good.

In your above list, you must remember that the "meaning" of the constraints cannot be treated in the same way that people try to treat the "meaning" of facts stored in a traditional AI system. That could be a big source of misunderstanding.

For example, there is no situation where a constraint looks like "Make sure the humans have enough food", and then the system has to go through some mechanism that interprets the meaning of that sentence in some rule-governed way.

This is a huge area, so I cannot get into the detail, but the bottom line is that the constraints would not be able to drift, or become inapplicable to future needs, because the source of those constraints is something deeper, something which in effect says "Keep the collective needs of humanity in mind, even as those needs might drift over the millennia." I think all of your four points above can be amply dealt with.



Richard Loosemore




-----
This list is sponsored by AGIRI: http://www.agiri.org/email
To unsubscribe or change your options, please go to:
http://v2.listbox.com/member/?member_id=8660244&id_secret=48483179-77da14

Reply via email to