On Wed, Jun 11, 2008 at 6:33 PM, J Storrs Hall, PhD <[EMAIL PROTECTED]> wrote: > Vladimir, > > You seem to be assuming that there is some objective utility for which the > AI's internal utility function is merely the indicator, and that if the > indicator is changed it is thus objectively wrong and irrational.
No, for objective function I was talking about there isn't necessarily any indicator. Utility is a way to model agent's behavior, it isn't necessarily of any use to agent itself. You assume utility as a way to *specify* agent's behavior, which I see as a bad idea. > There are two answers to this. First is to assume that there is such an > objective utility, e.g. the utility of the AI's creator. I implicitly assumed > such a point of view when I described this as "the real problem". But > consider: Any AI who believes this must realize that there may be errors and > approximations in its own utility function as judged by the "real" utility, > and must thus have as a first priority fixing and upgrading its own utility > function. Thus it turns into a moral philosopher and it never does anything > useful -- exactly the kind of Nirvana attractor I'm talking about. Why? If its goal is to approximate utility of given subsystem, it can try to do so, while running other errands, when it reaches required level of approximation of target system's utilities. If you start with enough safety mechanisms, it'll start to perform potentially dangerous operations only when it obtained enough competency in target utility (ethics/Friendliness). > On the other hand, it might take its utility function for granted, i.e. assume > (or agree to act as if) there were no objective utility. It's pretty much > going to have to act this way just to get on with life, as indeed most people > (except moral philosophers) do. They have their own utility function, that e.g. economists try to crudely approximate to lay out their treacherous plans. They don't need to copy them, unlike an AI which will be pretty useless or extremely dangerous if it doesn't obtain utility content and just launches in a random direction. > But this leaves it vulnerable to modifications to its own U(x), as in my > message. You could always say that you'll build in U(x) and make it fixed, > which not only solves my problem but friendliness -- but leaves the AI unable > to learn utility. I.e. the most important part of the AI mind is forced to > remain brittle GOFAI construct. Solution unsatisfactory. It shouldn't be fixed, but it should be stable. It should be refinable, but not malleable in any random direction -- just like knowledge, which it is. Friendliness content is learned, but as any other knowledge about the territory it is determined by the territory, and not by the caprices of the map, if AI is adequately rational. > I claim that there's plenty of historical evidence that people fall into this > kind of attractor, as the word nirvana indicates (and you'll find similar > attractors at the core of many religions). Yes, some people get addicted to a point of self-destruction. But it is not a catastrophic problem on the scale of humanity. And it follows from humans not being nearly stable under reflection -- we embody many drives which are not integrated in a whole. Which would be a bad design choice for a Friendly AI, if it needs to stay rational about Freindliness content. -- Vladimir Nesov [EMAIL PROTECTED] ------------------------------------------- agi Archives: http://www.listbox.com/member/archive/303/=now RSS Feed: http://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: http://www.listbox.com/member/?member_id=8660244&id_secret=103754539-40ed26 Powered by Listbox: http://www.listbox.com
