On Tue, Mar 4, 2014 at 11:15 AM, Aaron Hosford <[email protected]> wrote:
> Why not just shape the reward function so that attempts at self-modification 
> of it reduce the reward signal drastically?

Because self-modifying goal-seeking AI is a fiction. Practical AI is neither.

It is a popular fiction because real AI is complex and hard to build.
So we guess that a quick shortcut would be to specify a simple utility
function to control a general purpose learner, and have that learner
use the magic of intelligence to increase its own intelligence.

It is a bogus argument. Intelligence depends on knowledge and
computing power. The system described does not start with many bits of
knowledge. Nor can it make more bits by rewriting its own code.

On 01/03/2014 07:40, Tim Tyler wrote:
> Part of the problem is terminology. However, it is very useful to have a 
> general
> theory of learning based on reward, utility - or whatever you want to call the
> "goodness" metric. I feel frustrated with the critics; they don't seem to get 
> it.

We do have a theory. Hutter proved it is not computable.

Animal brains use an efficiently computable approximation of
reinforcement learning. When you receive a reward of r or penalty -r,
you increase the frequency of actions performed at time t before the
signal in proportion to r/t. It works to the extent that past events
predict future events with probability depending on the time since the
last occurrence. But it is not the same as rational goal-seeking
behavior. If it were, then your desire to take heroin would not depend
on whether you have tried it in the past.

-- 
-- Matt Mahoney, [email protected]


-------------------------------------------
AGI
Archives: https://www.listbox.com/member/archive/303/=now
RSS Feed: https://www.listbox.com/member/archive/rss/303/21088071-f452e424
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=21088071&id_secret=21088071-58d57657
Powered by Listbox: http://www.listbox.com

Reply via email to