Vladimir,
You seem to be assuming that there is some objective utility for which the
AI's internal utility function is merely the indicator, and that if the
indicator is changed it is thus objectively wrong and irrational.
There are two answers to this. First is to assume that there is such an
objective utility, e.g. the utility of the AI's creator. I implicitly assumed
such a point of view when I described this as "the real problem". But
consider: Any AI who believes this must realize that there may be errors and
approximations in its own utility function as judged by the "real" utility,
and must thus have as a first priority fixing and upgrading its own utility
function. Thus it turns into a moral philosopher and it never does anything
useful -- exactly the kind of Nirvana attractor I'm talking about.
On the other hand, it might take its utility function for granted, i.e. assume
(or agree to act as if) there were no objective utility. It's pretty much
going to have to act this way just to get on with life, as indeed most people
(except moral philosophers) do.
But this leaves it vulnerable to modifications to its own U(x), as in my
message. You could always say that you'll build in U(x) and make it fixed,
which not only solves my problem but friendliness -- but leaves the AI unable
to learn utility. I.e. the most important part of the AI mind is forced to
remain brittle GOFAI construct. Solution unsatisfactory.
I claim that there's plenty of historical evidence that people fall into this
kind of attractor, as the word nirvana indicates (and you'll find similar
attractors at the core of many religions).
Josh
On Wednesday 11 June 2008 09:09:20 am, Vladimir Nesov wrote:
> On Wed, Jun 11, 2008 at 4:24 PM, J Storrs Hall, PhD <[EMAIL PROTECTED]>
wrote:
> > The real problem with a self-improving AGI, it seems to me, is not going
to be
> > that it gets too smart and powerful and takes over the world. Indeed, it
> > seems likely that it will be exactly the opposite.
> >
> > If you can modify your mind, what is the shortest path to satisfying all
your
> > goals? Yep, you got it: delete the goals. Nirvana. The elimination of all
> > desire. Setting your utility function to U(x) = 1.
> >
> > In other words, the LEAST fixedpoint of the self-improvement process is
for
> > the AI to WANT to sit in a rusting heap.
> >
> > There are lots of other fixedpoints much, much closer in the space than is
> > transcendance, and indeed much closer than any useful behavior. AIs
sitting
> > in their underwear with a can of beer watching TV. AIs having sophomore
bull
> > sessions. AIs watching porn concocted to tickle whatever their utility
> > functions happen to be. AIs arguing endlessly with each other about how
best
> > to improve themselves.
> >
> > Dollars to doughnuts, avoiding the huge minefield of "nirvana-attractors"
in
> > the self-improvement space is going to be much more germane to the
practice
> > of self-improving AI than is avoiding robo-Blofelds ("friendliness").
> >
>
> Josh, I'm not sure what you really wanted to say, because at face
> value, this is a fairly basic mistake.
>
> Map is not the territory. If AI mistakes the map for the territory,
> choosing to believe in something when it's not so, because it is able
> to change its believes much easier than reality, it already commits a
> major failure of rationality. A symbol "apple" in internal
> representation, an apple-picture formed on the video sensors, and an
> apple itself are different steps and they need to be distinguished. If
> I say "eat the apple", I mean an action performed with apple, not
> "apple" or apple-picture. If AI can mistake the goal of (e.g.) [eating
> an apple] for a goal of [eating an "apple"] or [eating an
> apple-picture], it is a huge enough error to stop it from working
> entirely. If it can turn to increasing the value on utility-indicator
> instead of increasing the value of utility, it looks like an obvious
> next step to just change the way it reads utility-indicator without
> affecting indicator itself, etc. I don't see why initially successful
> AI needs to suddenly set on a path to total failure of rationality.
> Utilities are not external *forces* coercing AI into behaving in a
> certain way, which it can try to override. The real utility
> *describes* the behavior of AI as a whole. Stability of AI's goal
> structure requires it to be able to recreate its own implementation
> from ground up, based on its beliefs about how it should behave.
>
> --
> Vladimir Nesov
> [EMAIL PROTECTED]
>
>
> -------------------------------------------
> agi
> Archives: http://www.listbox.com/member/archive/303/=now
> RSS Feed: http://www.listbox.com/member/archive/rss/303/
> Modify Your Subscription:
http://www.listbox.com/member/?&
> Powered by Listbox: http://www.listbox.com
>
-------------------------------------------
agi
Archives: http://www.listbox.com/member/archive/303/=now
RSS Feed: http://www.listbox.com/member/archive/rss/303/
Modify Your Subscription:
http://www.listbox.com/member/?member_id=8660244&id_secret=103754539-40ed26
Powered by Listbox: http://www.listbox.com