Derek Zahn wrote:
Richard:
> You agree that if we could get such a connection between the
> probabilities, we are home and dry? That we need not care about
> "proving" the friendliness if we can show that the probability is simply
> too low to be plausible?
Yes, although the probability itself would have to be proven from first
principles to be as strong as Friendliness. For any actual system such
rigor seems as unlikely as Friendliness itself.
Oh, I think that is too strong a reservation: you have to know the
design to be sure about how the probabilities would be calculated, and I
am far from describing the design and the calculations at the moment.
I would argue that if the system is built in such a way that everything
in it must happen as a result of multiple constraint satisfaction (all
the processes are governed by it, so there are no weak spots where
something could come in and take it over), then all events are subject
to the same bullet-proof resistance to tampering, and the implications
of that, for the calculation of probabilities, is that the behavior of
the system becomes more like a statistical mechanics problem: you can
treat the probability of certain kinds of events as determined by
simple, uniform factors, and do the math on them.
(Same argument is then used for all the lower level hardware layers, by
the way: the AGI itself would help to design underlying hardware that
involves distributed constraint satisfaction right down as far as possible).
So you see the goal is to map the architecture onto a class of
statistical mechanics problem, then do the math from there. IF that
mapping is possible, then the calculation becomes relatively trivial.
I especially cannot agree that this is in the same class as "proving"
friendliness with a capital F .... we cannot even begin to get ANY idea
about how to do that! There is a world of difference between where
Yudkowsky's idea of "capital-F" friendliness has gotten to, and the
proposal I have outlined here: I have given a strategy for mapping the
problem onto a known class of problems, I think.
> Once the system is
> set up to behave according to a diffuse set of checks and balances (tens
> of thousands of ideas about what is "right", rather than one single
> directive), it can never wander far from that set of constraints without
> noticing the departure immediately.
>
> Would you agree that IF such a design were feasible, you would not be
> able to think of any way to bollix it?
* They have to be the right set of checks and balances, that completely
cover ill-defined territory
* Nothing unforseen can arise that is not covered by the designed-in
checks and balances
* The meaning of the constraints has to be applicable to all future
developments somehow (e.g. the changing nature of humanity)
* The meaning of the constraints and the complex items they operate on
has to be immune to drift
Given all that, nothing springs immediately into my little mind to
disagree with your conclusion.
Note that I think this type of approach is an excellent way to try for
little-f friendliness, which is probably our best and only option. I
like it a lot.
Okay, this is good.
In your above list, you must remember that the "meaning" of the
constraints cannot be treated in the same way that people try to treat
the "meaning" of facts stored in a traditional AI system. That could be
a big source of misunderstanding.
For example, there is no situation where a constraint looks like "Make
sure the humans have enough food", and then the system has to go through
some mechanism that interprets the meaning of that sentence in some
rule-governed way.
This is a huge area, so I cannot get into the detail, but the bottom
line is that the constraints would not be able to drift, or become
inapplicable to future needs, because the source of those constraints is
something deeper, something which in effect says "Keep the collective
needs of humanity in mind, even as those needs might drift over the
millennia." I think all of your four points above can be amply dealt with.
Richard Loosemore
-----
This list is sponsored by AGIRI: http://www.agiri.org/email
To unsubscribe or change your options, please go to:
http://v2.listbox.com/member/?member_id=8660244&id_secret=48483179-77da14