Terren Suydam <[EMAIL PROTECTED]> was quoted to say: >I've been saying that Friendliness is impossible to implement because 1) >it's a moving target (as in, changes through time), since 2) its definition >is dependent on context (situational context, cultural context, etc).
I think that Friendliness is doing what people want, and an AI would be able to infer what people want using a process similar to how people infer what each other want. What people want depends on context, but the process by which one could infer what people want does not depend on context. Different people want different things, so the AI use a weighted average of their utilities as its utility. Surely you could land in Mongolia somewhere, watch people behave, and estimate who wants what. This problem is solvable, in principle. >In other words, Friendliness is not something that can be >hardwired. It can't be formalized, coded, ... Great! There's a decision procedure at http://www.fungible.com/respect/code/index.html. If you don't understand it, read the paper at http://www.fungible.com/respect/paper.html. Tell me why it's wrong. Well, actually, I know it's wrong, but the bugs appear fixable. The bugs are of the "oops" variety rather than "the concept is meaningless" variety. The known bugs are: * It is subject to a common logical fallacy I call "The Novice Philosopher Problem". See http://www.fungible.com/respect/paper.html#novice-philosopher-problem. My bogus misdefinition of the word "probabilty" should probably go away, since solving this issue will probably require moving to real probabilities. * It takes a planning horizon as a parameter, and then at the end of the planning horizon it gives people what they want. At that point the estimate of what people want takes into account the naturally-occuring irregular human planning horizon. So if (hypothetically) you're a crack addict who wants your fix right now, and the AI is planning a year at a time, you won't be happy with what it does for you. The delayed gratification is a spurious implicit statement that long-term planning is morally superior to short-term planning, so to minimize conflict with the short-sighted the algorithm should always be run with one timestep as the planning horizon. * It's not very good at dealing with personal indebtedness. For example, suppose the AI has been told to respect people enough that it won't consider stealing. The present code would only go shopping if it could pick out the merchandise and then pay for it all within the planning horizon. This doesn't interact well with the previous issue where we should set the planning horizon to be very short. I think the right fix here is maintain a debt per person and then to define respect in terms of people getting what they are owed. The present scheme is like the desired scheme, except in the present scheme the debt is always zero. > It can't be ... designed, implemented ... I agree that nobody has done that yet. > It can't be ... proved. Proving that the decision procedure would actually produce behavior I like presents a logical puzzle. The decision procedure itself is the only formal description of what I like that I have available. So what is there to prove? I wish I knew a better approach to this. -- Tim Freeman http://www.fungible.com [EMAIL PROTECTED] ------------------------------------------- agi Archives: https://www.listbox.com/member/archive/303/=now RSS Feed: https://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: https://www.listbox.com/member/?member_id=8660244&id_secret=111637683-c8fa51 Powered by Listbox: http://www.listbox.com
