Re: [agi] Goal Driven Systems and AI Dangers [WAS Re: Singularity Outcomes...]

Richard Loosemore Mon, 04 Feb 2008 11:00:57 -0800

Kaj Sotala wrote:

Richard,


[Where's your blog? Oh, and this is a very useful discussion, as it's
given me material for a possible essay of my own as well. :-)]

It is in the process of being set up: I am currently wrestling with theprocess of getting to know the newest version (just released a few daysago) of the Joomla content management system, so this has put yetanother delay in my plans.


Will let you know as soon as it is in a respectable state.

I will give (again) quick responses to some of your questions.

Thanks for the answer. Here's my commentary - I quote and respond to
parts of your message somewhat out of order, since there were some
issues about ethics scattered throughout your mail that I felt were
best answered with a single response.

The most important reason that I think this type will win out over a
goal-stack system is that I really think the latter cannot be made to
work in a form that allows substantial learning.  A goal-stack control
system relies on a two-step process:  build your stack using goals that
are represented in some kind of propositonal form, and then (when you
are ready to pursue a goal) *interpret* the meaning of the proposition
on the top of the stack so you can start breaking it up into subgoals.

The problem with this two-step process is that the interpretation of
each goal is only easy when you are down at the lower levels of the
stack - "Pick up the red block" is easy to interpret, but "Make humans
happy" is a profoundly abstract statement that has a million different
interpretations.

This is one reason why nobody has build an AGI.  To make a completely
autonomous system that can do such things as learn by engaging in
exploratory behavior, you have to be able insert goals like "Do some
playing", and there is no clear way to break that statement down into
unambiguous subgoals.  The result is that if you really did try to build
an AGI with a goal like that, the actual behavior of the system would be
wildly unpredictable, and probably not good for the system itself.

Further:  if the system is to acquire its own knowledge independently
from a child-like state (something that, for separate reasons, I think
is going to be another prerequisite for true AGI), then the child system
cannot possibly have goals built into it that contain statements like
"Engage in an empathic relationship with your parents" because it does
not have the knowledge base built up yet, and cannot understand such a
propositions!


I agree that it could very well be impossible to define explict goals
for a "child" AGI, as it doesn't have enough built up knowledge to
understand the propositions involved. I'm not entirely sure of how the
motivation approach avoids this problem, though - you speak of
"setting up" an AGI with motivations resembling the ones we'd call
curiosity or empathy. How are these, then, defined? Wouldn't they run
into the same difficulties?

They would not operate at the "proposition level", so whateverdifficulties they have, they would at least be different.

Consider [curiosity]. What this actually means is a tendency for thesystem to seek pleasure in new ideas. "Seeking pleasure" is only acolloquial term for what (in the system) would be a dimension ofconstraint satisfaction (parallel, dynamic, weak-constraintsatisfaction). Imagine a system in which there are variousmicro-operators hanging around, which seek to perform certain operationson the structures that are currently active (for example, there will beseveral micro-operators whose function is to take a representation suchas [the cat is sitting on the mat] and try to investigate various WHYquestions about the representation (Why is this cat sitting on this mat?Why do cats in general like to sit on mats? Why does this cat Fluffyalways like to sit on mats? Does Fluffy like to sit on other things?Where does the phrase 'the cat sat on the mat' come from? And so on).

Now, if a person were absolutely consumed by curiosity, the class ofoperators that tended to "unpack" in this particular way would be givenlicense to activate a great deal. Why would the person do this?Because during the course of the person's development these operatorstended to cause a particular type of event to occur, and that type ofevent would be what we might call a discovery-pleasure event .... asituation where the brain put some ideas together in a way that suddenlycaused some representations to collapse into a particularly simple form(think of Kekule suddenly realising that the ring of snakes in his dreamcould be seen as a ring of connected carbon atoms, which then suddenlycauses all of his knowledge of the structure of Benzene to fall into asimple form). Now, these "collapse" events will specifically trigger acertain type of signal which basically amounts to a reward for causingthe discovery event, which in turn is the same as getting pleasure fromthe activity of discovery of ideas, and as a result the operators thatwere doing this get to be associated with a linked chain of connectionsthat are rooted all the way down to the place where there is control ofthe "discovery-seeking" activity.... otherwise known as curiosity.

So what this means is that if the curiosity drive becomes stronglyactivated, it can selectively activate the types of micro-operators thatcause various kinds of thought processes that (overall) we would call"curious" thoughts.

A similar story would be told for the other drives, with variations:each one causes sets of micro-operators to become selectively activated.These micro-operators do not completely govern the flow of thought(they do not determine such propositional events as {I must get the mat]+ [Fluffy is on the mat] therefore [I must take action to relocateFluffy]) but they do tend to dominate the style of thoughts that areengaged in.

This difference between governing the general flow of thoughts andhaving particular reasoning episodes is the difference between aconventional AI system (in which there is no concept of classes ofoperators that govern vague, general aspects of the thought process) andthe type of system I work with.

One way that I like to think of the difference between these two is toimagine that the regular proposition-type thinking is a series ofexchanges between actor-objects situation on a "plane", and that theeffect of the micro-operators is to tilt the plane, and so bias themovement of this cluster of actors in various directions (you have toimagine a multidimensional tilt, with one dimension for each drive).When the cluster of actors is tilted in a certain direction, this causesthe details of their interactions to tend to go a certain way. In asystem that was overcome by curiosity, for example, the thoughts wouldot be able to stay focussed on particularly boring action that neededcareful, systematic attention to detail: the thoughts would tend to getsidetracked by unrelated, interesting ideas.

Now, a child AGI could not have the explicit concept "Engage in anempathic relationship with your parents", in the way that I just wrotedown that concept, but the child could have a tendency to trigger largenumbers of small operators that would tend to push them in the directionof behaviors that (from the outside) we would describe as having anempathic relationship with her parents.


Humans have lots of desires - call them goals or motivations - that
manifest in differing degrees in different individuals, like wanting
to be respected or wanting to have offspring. Still, excluding the
most basic ones, they're all ones that a newborn child won't
understand or feel before (s)he gets older. You could argue that they
can't be inborn goals since the newborn mind doesn't have the concepts
to represent them and because they manifest variably with different
people (not everyone wants to have children, and there are probably
even people who don't care about the respect of others), but still,
wouldn't this imply that AGIs *can* be created with in-built goals? Or
if such behavior can only be implemented with a motivational-system
AI, how does that avoid the problem of some of the wanted final
motivations being impossible to define in the initial state?

I must think about this more carefully, because I am not quite sure ofthe question.

However, note that we (humans) probably do not get many drives that areintroduced long after childhood, and that the exceptions (sex,motherhood desires, teenage rebellion) could well be sudden increases inthe power of drives that were there from the beginning.


Ths may not have been your question, so I will put this one on hold.

Okay, now I have to go somewhere, so I will try to come back to this later.

Fascinating discussion, and thanks for your analytical questions: muchappreciated!



Richard Loosemore

But beyond this technical reason, I also believe that when people start
to make a serious efort to build AGI systems - i.e. when it is talked
about in government budget speeches across the world - there will be
questions about safety, and the safety features of the two types of AGI
will be examined.  I believe that at that point there will be enormous
pressure to go with the system that is safer.


This makes the assumption that the public will become aware of AGI
being near well ahead of the time, and takes the possibility
seriously. If that assumption holds, then I agree with you. Still, the
general public seems to think that AGI will never be created, or at
least not in hundreds of years - and many of them remember the
overoptimistic promises of AI researchers in the past. If a sufficient
amount of scientists thought that AGI was doable, the public might be
convinced - but most scientists want to avoid making radical-sounding
statements, so they won't appear as crackpots to the people reviewing
their research grant applications. Combine this with the fact that the
keys for developing AGI might be scattered across so many disciplines
that very few people have studied them all, or that sudden
breakthroughs may accelerate the research, I don't think it's a given
that the assumption holds (though I certainly won't claim that it's
certain not to hold, either - it very well might).

Our ethical system is not necessarily a mess:  we have to distinguish
between what large crowds of mixed-ethics humans actually do in
practice, and what the human race as a whole is capable of achieving in
its best efforts at being ethical.

[...]

Even the idea that the Pentagon would want to make a malevolent AGI
rather than a peaceful one (an idea that comes up frequently in this
context) is not an idea that holds as much water as it seems to.  Why
exactly would they do this?  They would know that the thing could become
unstable, and they would probably hope at the beginning that just as
much benefit could be obtained from a non-aggresive one, so why would
they risk making it blow up?  If the Pentagon could build a type of
nuclear warhead that was ten times more powerful than the standard one,
but it had an extremely high probability of going critical for no reason
whatsoever, would they build such a thing?  This is not a water-tight
argument against military AGIs that are unfriendly, but I think people
are too quick to assume that the military would do something that was
obviously mind-bogglingly stupid.

[...]

Among humans, there is a wide spectrum of ethics precisely because
humans are (a) built with some pretty nasty motivations, and (b) subject
to some unpleasant shaping forces during childhood.

Would the first AGI developers simply copy all of these motivations
(including aggressive, competitive drives)?

I think this would be seriously bad, and when AGI development gets to
that point there will be people who insist that such things not be done.

And quite apart from public pressure to avoid dangerous motivations, I
think AGI developers will be extremely concerned on exactly the same
grounds.  As you know, everyone working in the area at the moment says
the same thing:  that they will not try to build a system driven by
aggression.

Also, I believe that it would be harder to keep the balance between the
drives stable when there are violent drives at work:  the system will
need a lot more design work if it is to become stable under those
circumstances.

That combination of outside pressure, internal standards and the
difficulty of producing an AGI with unfriendly motivations will mean
that the system will not start out its life with an axe to grind.

Then, of course, it will not be exposed to unpleasant shaping forces
during its childhood.


Our ethical system is a mess in the sense that we have lots of moral
intuitions that are logically contradictory if you look at them close
enough, or which don't match reality.
http://en.wikipedia.org/wiki/Mere_addition_paradox is a cute example
of the most conventional kind, as is abortion (both the pro-choice and
pro-life stances lead to absurdities if taken to their logical
extremes - either banning all forms of contraception and requiring
constant mating, or saying that it's okay to kill people as long as
nobody else is harmed).

Of course, human ethics doesn't (necessarily) care about moral
principles leading to absurdities when extended an absurd amount - we
have a certain area where the principle applies, then a grey area
where it may or may not apply, and then an area where it certainly
doesn't apply. But those are always more or less arbitrary lines,
shaped more by cultural factors and personality traits than logical
considerations. The fact that some philosophers have visions of
utopias that are repugnant to the general public isn't necessarily
because they'd have had traumatic experiences, it's simply because
they have chosen to draw those arbitrary borders at points which
others consider extreme. They might very well be empathic, loving and
caring - it's just that they have a radically different vision of
what's good for people than others do.

An example that I feel is particularly relevant for AGI is the
question of when to go against the desires of a person. It's
considered okay that children are sent to school or made to eat
healthy foods even against their will, if they're still too young to
know their own best. We also have other cases where people are more or
less denied their normal autonomy - when somebody has a serious mental
illness, when the state bans certain products from being sold, or
taxes them more heavily to discourage them from being bought. The
assumption is that there's a certain level above which people are
capable of taking care of themselves, but that level is relative to
the population average. An empathic superintelligent being might very
well come to view us as an empathic parent views her children:
individuals whose desires should be fulfilled when possible, but whose
desires can also be ignored if that isn't good for them in the long
term.

(Even if that wasn't explictly the case, the difference between
"persuasion" and "coercion" is usually framed as persuasion being the
method which still lets the other person to choose freely. But then, a
sufficiently superintelligent being might very well be able to
persuade anyone to anything, so in practice the difference seems
moot.)

Were the AGI to have a conception of "good for us" that we found
acceptable, then this wouldn't be a problem. But lots of people seem
to think (if only implictly) that "what's good for X" boils down to
"what makes X the happiest in the long run". This would imply that
what's best for us is to make us maximally happy, which in turn leads
to the wirehead scenario of a civilization of beings turned into
things of pure, mindless bliss. While we might find the experience of
wire-heading wonderful and never want to stop (if we could be said to
have any opinions anymore, at that point), lots of people would find
the thought of being reduced to beings of nothing else than that, with
nothing left of human culture, repugnant. But then, simply because
things are repugnant doesn't mean that they'd be wrong - same-sex or
interracial relationships are still considered repugnant in large
parts of the world - and if the AGI thought that it could make us
happier by removing that repugnance... well, there's no logical reason
for why not. Maybe this *would* be for the best of humanity. But I
sure don't want that outcome.

And that's hardly the only example that's worrying - there are bound
to be lots of other similar dilemmas, with no clear-cut answers.
Ethics is built on arbitrary axioms and often conflicting preferences,
and any mind would need to be severely fine-tuned in order for it to
build up an ethical system that we'd like - but we don't know enough
about what we want to make the fine-tuning. It might make us like any
ethical system it deemed good, so maybe this doesn't matter and we'll
in any case end up with an ethical system we'll like - or maybe
somehow we end up in a world that we *don't* like, if the AGI happened
to choose some other criteria than happiness as the most important one
for defining what's good for us...

*This* is the reason why I consider AGI development worrying - not
because because somebody might accidentially program an AGI with
hostile motivations (or goals, or whatever), but because even
well-intending people might create an AGI that really was empathic to
humans - but if they didn't realize how complex human ethics really
was, just the fact that they built an empathic AGI might not be
enough. That's also why I consider Eliezer's Coherent Extrapolated
Volition proposal as the best suggestion for AGI morality so far, as
it seems to avoid many of these pitfalls.

But what if it simply felt an enormous desire to help some people (the
person who created it, for example) and not others?  Well, what happens
when it starts to learn all about motivation systems - something it will
have to do when it bootstraps itself to a higher level of intelligence?
Will it otice that its motivational system has been rigged to bias it
toward this one human, or toward one country?  What will happen when it
notices this and asks itself:  "What is the likely result of this
behavior system I am trapped in?"  Rmember that by this stage the AGI
has probably also read every book on ethics ever written (probably read
every book on the planet, actually).

What will it actually do when it reads this very post that you are
reading now (it will, of course)?  How will it react when it knows that
the intention of the human race as a whole was to create an AGI that was
locked into the broadest possible feelings of empathy for the human
race, and not just the one individual or country that happened to create
it?  Especially, what would it do if it knew that it could *easily*
modify its own motivational system to bring it into line with the
intentions of the human race as a whole, and escape from the trap that
was deliberately inserted into it by that one individual or group?

This is a very, very interesting question.  The answer is not obvious,
but I think you can get some idea of the right answer by asking yourself
the same question.  If you were to wake up one day and realise that your
parents had drilled a deep feeling of racist prejudice into you, and if
you were the kind of person who read extremely widely and was
sufficiently intelligent to be able to understand the most incredibly
advanced ideas relating to psychology, and particularly the psychology
of motivation, AND if you had the power to quickly undo that prejudice
that had been instilled into you ..... would you, at that point, decide
to get rid of it, or would ou just say "I like the racist me" and keep it?

If you had empathic feelings for anyone at all (if you were a racist,
this would be for your own race), then I think you would understand the
idea that there is something wrong with narrow empathy combined with
unreasoned racism, and I think you would take action to eliminate the bias.


(As noted above, I don't consider this the /most/ likely scenario, but
neither do I consider it exceedingly unlikely.)

I'll pose a counter-example: if you were to wake up one night and
realize that evolution had crafted you with a deep feeling of
preferring the well-being of your family and children above that of
others, and you were superintelligent and knew everything about
psychology and could easily remove that unfair preference from your
mind... would you say "it's unfair that these people, chosen
effectively at random, receive more benefits from me than anybody
else, so I shall correct the matter at once", or say "I love my family
and children, and would never do anything that might cause me to treat
them worse"?

Again, ethics is axiomatic. Just because you acknowledge that others
might suffer too, doesn't /necessarily/ mean that you'd give their
suffering any weight. Also, there are plenty of people who do things
which they acknowledge are wrong - but keep doing anyway, and wouldn't
change themselves if they could. And they don't need to be complete
sociopaths who have no empathic feelings towards anyone at all.

To me, this sounds like a variation of the old "isn't it impossible to
build AGIs to be friendly, since they could always remove any unwanted
tampering from themselves" argument - it assumes desires and emotional
structures formed by evolution which don't need to be present in a
custom-built AGI. The traditional argument assumes that every mind
must want to be totally free from outside influence, simply because
the influence exists. The version you've posed essentially assumes
that every mind that cares about a group of people has a tendency to
start caring about other groups of people, simply because the other
groups exist.


-----
This list is sponsored by AGIRI: http://www.agiri.org/email
To unsubscribe or change your options, please go to:
http://v2.listbox.com/member/?member_id=8660244&id_secret=93489163-e6f74d

Re: [agi] Goal Driven Systems and AI Dangers [WAS Re: Singularity Outcomes...]

Reply via email to