On 08/06/06, William Pearson <[EMAIL PROTECTED]> wrote:


With regards to how careful I am being with the system: one of the
central design guidances for the system is to assume the programs in
the hardware are selfish and may do things I don't want. The failure
mode I envisage more than exponential self-improvement is wireheading,
but the safeguards for making sure it can't wirehead also make sure it
is weak.

Eugen asked me off list what I meant by wireheading, in humans or in the AI.

I mean it in the AI. My main model of how weak intelligent systems work is a
combination of decentralised reinforcement learning combined with a
very loose form of neural darwinism. So each program is a selfish
replicator whose goal is not so much to get reinforcement but to
survive, if the system works to plan they should do this by attempting
to get positive reinforcement in the proscribed fashion.

However if the system breaks they don't actually need to care about
getting positive reinforcement or the manner which it occurs, all they
would need to care about was survival. So one failure mode is
wireheading, getting reinforcement in a manner the system designer
doesn't chose. Another is reducing the reinforcement a competitor can
get without penalty  so it can't overwrite the sabotaging program
(e.g. running the battery down by computing, so the controlling
program can't achieve goals). Both failure modes tend to suggest the
AI would sit gibbering in the corner, rather than taking over the
universe.

I am also interested in avoiding monopolies on information within the
system.  If that isn't done the evolutionary mechanisms, which choose
which programs should be within the system, would break down as the
program with the monopoly would have too much power. This
failure mode would characterised by an inability to get rid of or
change or improve upon a part of the system. That is programs of the
system would naturally tend to conservatism unless they were forced by
evolutionary pressure. So again not a particularly exciting failure
mode.

As no one program would be allowed to read all the other programs (to
avoid information monopoly and giving away information to
competitors), the system as a whole should be as ignorant of the
meaning/purpose of each program/setting as a human would be if given
access to the important variables within their own brain. So the best
it could do if attempting to bot net the whole Internet would be to
make a copy of itself or send individual programs out.

So the first line of defense is then preventing the robot in the real
world having access to its own total code, but this is the same as the
first line of defense in stopping physical wireheading. I am mainly
interested in manipulator less robots such as the oxford wearable
robot, but for ones with manipulators you could make them not like to
access their own internals, giving them negative feedback for
attempting to open themselves up.

I suppose I might be downplaying the existential risks of strong
self-improvement, but as I am interested in vertebrate kinds of
intelligence, I am more concerned about those failure modes we see in
every day life (addiction, OCD) of those sorts of systems. The
measures I put in place to try and prevent these failures, also happen to
put road blocks in the path to it becoming strongly self-improving.

Will Pearson

-------
To unsubscribe, change your address, or temporarily deactivate your subscription, please go to http://v2.listbox.com/member/[EMAIL PROTECTED]

Reply via email to