No it would not. AI willk "press its own buttons" only if those buttons are defined. In one sense you can say that Goedel's theorem is a proof of friendliness as it means that there must always be one button that AI cannot press.
- Ian Parker On 4 July 2010 16:43, Abram Demski <[email protected]> wrote: > Joshua, > > But couldn't it game the external utility function by taking actions which > modify it? For example, if the suggestion is taken literally and you have a > person deciding the reward at each moment, an AI would want to focus on > making that person *think* the reward should be high, rather than focusing > on actually doing well at whatever task it's set...and the two would tend to > diverge greatly for more and more complex/difficult tasks, since these tend > to be harder to judge. Furthermore, the AI would be very pleased to knock > the human out of the loop and push its own buttons. Similar comments would > apply to automated reward calculations. > > --Abram > > > > *agi* | Archives <https://www.listbox.com/member/archive/303/=now> > <https://www.listbox.com/member/archive/rss/303/> | > Modify<https://www.listbox.com/member/?&>Your Subscription > <http://www.listbox.com> > ------------------------------------------- agi Archives: https://www.listbox.com/member/archive/303/=now RSS Feed: https://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: https://www.listbox.com/member/?member_id=8660244&id_secret=8660244-6e7fb59c Powered by Listbox: http://www.listbox.com
