The first 3 are fundamentally the same thing. You have an internal model,
inputs, outputs, and a reward signal. The model determines the mapping from
inputs to outputs. The outputs and their associated reward signal values
are used to optimize the model parameters for maximum reward. The choice of
model dictates which problems can be easily learned and which cannot.

AIXI is just one particular type of model, whose optimization happens to be
very very computationally expensive because the model is very very general
-- overly general, I would say. Human beings, on the other hand, have a
model which is highly optimized for the task at hand, eliminating a lot of
parameter choices which don't need to be learned because our bodies and
certain features about our environment are highly predictable. (For
example, we are hardwired for efficiently processing visual data.) Our
model is biased towards the type of environment we live in, making learning
its features much less computationally intensive at the expense of some
generality and certain types of flexibility.

Right now, AI is struggling to identify models which are correctly biased
to minimize learning cost within real world environments. Compared to
building entire working systems, it is less difficult to design reward
functions that correctly align with our own goals. (Otherwise,
reinforcement learning wouldn't be useful.) The difficulty lies instead in
building systems that can effectively learn those signals. When we have a
model which is capable of effectively learning real world environments, we
can then swap out reward functions based on changing design
goals/constraints without having to redesign the entire model. (The reward
function is nothing more than the encoding of those goals and constraints
into the language of mathematics.) This plug-and-play interchangeability is
the real promise of reinforcement learning.


On Mon, Mar 3, 2014 at 10:46 AM, Piaget Modeler
<[email protected]>wrote:

> Personally, I like #3 best.
>
> ~PM
>
> > Date: Mon, 3 Mar 2014 10:45:41 -0500
> > Subject: Re: [agi] "Reward" and "utility" are fundamentally the same
> > From: [email protected]
> > To: [email protected]
>
> >
> > There are different kinds of reinforcement learning.
> >
> > 1. The AIXI model. The agent does not know the utility function and
> > must learn it. It assumes the simplest model that fits observation.
> >
> > 2. The MIRI model. A powerful agent lives in a complex environment
> > with a simple and well understood (but poorly designed) utility
> > function. It uses reasoning and thought experiments to predict which
> > actions will maximize future reward.
> >
> > 3. The animal model (including humans). A reward (or penalty) acts to
> > increase (or decrease) the frequency of behavior performed at time t
> > before the signal with effect proportional to 1/t.
> >
> > 4. The practical AI model. The AI has no goals. Instead, its behavior
> > is continually updated by the humans controlling it to meet the
> > complex and poorly understood goals of the humans.
> >
> > --
> > -- Matt Mahoney, [email protected]
> >
> >
> > -------------------------------------------
> > AGI
> > Archives: https://www.listbox.com/member/archive/303/=now
> > RSS Feed:
> https://www.listbox.com/member/archive/rss/303/19999924-4a978ccc
> > Modify Your Subscription: https://www.listbox.com/member/?&;
>
> > Powered by Listbox: http://www.listbox.com
>    *AGI* | Archives <https://www.listbox.com/member/archive/303/=now>
> <https://www.listbox.com/member/archive/rss/303/23050605-2da819ff> |
> Modify<https://www.listbox.com/member/?&;>Your Subscription
> <http://www.listbox.com>
>



-------------------------------------------
AGI
Archives: https://www.listbox.com/member/archive/303/=now
RSS Feed: https://www.listbox.com/member/archive/rss/303/21088071-f452e424
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=21088071&id_secret=21088071-58d57657
Powered by Listbox: http://www.listbox.com

Reply via email to