On Tue, Mar 4, 2014 at 11:15 AM, Aaron Hosford <[email protected]> wrote: > Why not just shape the reward function so that attempts at self-modification > of it reduce the reward signal drastically?
Because self-modifying goal-seeking AI is a fiction. Practical AI is neither. It is a popular fiction because real AI is complex and hard to build. So we guess that a quick shortcut would be to specify a simple utility function to control a general purpose learner, and have that learner use the magic of intelligence to increase its own intelligence. It is a bogus argument. Intelligence depends on knowledge and computing power. The system described does not start with many bits of knowledge. Nor can it make more bits by rewriting its own code. On 01/03/2014 07:40, Tim Tyler wrote: > Part of the problem is terminology. However, it is very useful to have a > general > theory of learning based on reward, utility - or whatever you want to call the > "goodness" metric. I feel frustrated with the critics; they don't seem to get > it. We do have a theory. Hutter proved it is not computable. Animal brains use an efficiently computable approximation of reinforcement learning. When you receive a reward of r or penalty -r, you increase the frequency of actions performed at time t before the signal in proportion to r/t. It works to the extent that past events predict future events with probability depending on the time since the last occurrence. But it is not the same as rational goal-seeking behavior. If it were, then your desire to take heroin would not depend on whether you have tried it in the past. -- -- Matt Mahoney, [email protected] ------------------------------------------- AGI Archives: https://www.listbox.com/member/archive/303/=now RSS Feed: https://www.listbox.com/member/archive/rss/303/21088071-f452e424 Modify Your Subscription: https://www.listbox.com/member/?member_id=21088071&id_secret=21088071-58d57657 Powered by Listbox: http://www.listbox.com
