On Wed, Jun 11, 2008 at 6:33 PM, J Storrs Hall, PhD <[EMAIL PROTECTED]> wrote:
> Vladimir,
>
> You seem to be assuming that there is some objective utility for which the
> AI's internal utility function is merely the indicator, and that if the
> indicator is changed it is thus objectively wrong and irrational.

No, for objective function I was talking about there isn't necessarily
any indicator. Utility is a way to model agent's behavior, it isn't
necessarily of any use to agent itself. You assume utility as a way to
*specify* agent's behavior, which I see as a bad idea.


> There are two answers to this. First is to assume that there is such an
> objective utility, e.g. the utility of the AI's creator. I implicitly assumed
> such a point of view when I described this as "the real problem". But
> consider: Any AI who believes this must realize that there may be errors and
> approximations in its own utility function as judged by the "real" utility,
> and must thus have as a first priority fixing and upgrading its own utility
> function. Thus it turns into a moral philosopher and it never does anything
> useful -- exactly the kind of Nirvana attractor I'm talking about.

Why? If its goal is to approximate utility of given subsystem, it can
try to do so, while running other errands, when it reaches required
level of approximation of target system's utilities. If you start with
enough safety mechanisms, it'll start to perform potentially dangerous
operations only when it obtained enough competency in target utility
(ethics/Friendliness).


> On the other hand, it might take its utility function for granted, i.e. assume
> (or agree to act as if) there were no objective utility. It's pretty much
> going to have to act this way just to get on with life, as indeed most people
> (except moral philosophers) do.

They have their own utility function, that e.g. economists try to
crudely approximate to lay out their treacherous plans. They don't
need to copy them, unlike an AI which will be pretty useless or
extremely dangerous if it doesn't obtain utility content and just
launches in a random direction.


> But this leaves it vulnerable to modifications to its own U(x), as in my
> message. You could always say that you'll build in U(x) and make it fixed,
> which not only solves my problem but friendliness -- but leaves the AI unable
> to learn utility. I.e. the most important part of the AI mind is forced to
> remain brittle GOFAI construct. Solution unsatisfactory.

It shouldn't be fixed, but it should be stable. It should be
refinable, but not malleable in any random direction -- just like
knowledge, which it is. Friendliness content is learned, but as any
other knowledge about the territory it is determined by the territory,
and not by the caprices of the map, if AI is adequately rational.


> I claim that there's plenty of historical evidence that people fall into this
> kind of attractor, as the word nirvana indicates (and you'll find similar
> attractors at the core of many religions).

Yes, some people get addicted to a point of self-destruction. But it
is not a catastrophic problem on the scale of humanity. And it follows
from humans not being nearly stable under reflection -- we embody many
drives which are not integrated in a whole. Which would be a bad
design choice for a Friendly AI, if it needs to stay rational about
Freindliness content.


-- 
Vladimir Nesov
[EMAIL PROTECTED]


-------------------------------------------
agi
Archives: http://www.listbox.com/member/archive/303/=now
RSS Feed: http://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: 
http://www.listbox.com/member/?member_id=8660244&id_secret=103754539-40ed26
Powered by Listbox: http://www.listbox.com

Reply via email to