The first 3 are fundamentally the same thing. You have an internal model, inputs, outputs, and a reward signal. The model determines the mapping from inputs to outputs. The outputs and their associated reward signal values are used to optimize the model parameters for maximum reward. The choice of model dictates which problems can be easily learned and which cannot.
AIXI is just one particular type of model, whose optimization happens to be very very computationally expensive because the model is very very general -- overly general, I would say. Human beings, on the other hand, have a model which is highly optimized for the task at hand, eliminating a lot of parameter choices which don't need to be learned because our bodies and certain features about our environment are highly predictable. (For example, we are hardwired for efficiently processing visual data.) Our model is biased towards the type of environment we live in, making learning its features much less computationally intensive at the expense of some generality and certain types of flexibility. Right now, AI is struggling to identify models which are correctly biased to minimize learning cost within real world environments. Compared to building entire working systems, it is less difficult to design reward functions that correctly align with our own goals. (Otherwise, reinforcement learning wouldn't be useful.) The difficulty lies instead in building systems that can effectively learn those signals. When we have a model which is capable of effectively learning real world environments, we can then swap out reward functions based on changing design goals/constraints without having to redesign the entire model. (The reward function is nothing more than the encoding of those goals and constraints into the language of mathematics.) This plug-and-play interchangeability is the real promise of reinforcement learning. On Mon, Mar 3, 2014 at 10:46 AM, Piaget Modeler <[email protected]>wrote: > Personally, I like #3 best. > > ~PM > > > Date: Mon, 3 Mar 2014 10:45:41 -0500 > > Subject: Re: [agi] "Reward" and "utility" are fundamentally the same > > From: [email protected] > > To: [email protected] > > > > > There are different kinds of reinforcement learning. > > > > 1. The AIXI model. The agent does not know the utility function and > > must learn it. It assumes the simplest model that fits observation. > > > > 2. The MIRI model. A powerful agent lives in a complex environment > > with a simple and well understood (but poorly designed) utility > > function. It uses reasoning and thought experiments to predict which > > actions will maximize future reward. > > > > 3. The animal model (including humans). A reward (or penalty) acts to > > increase (or decrease) the frequency of behavior performed at time t > > before the signal with effect proportional to 1/t. > > > > 4. The practical AI model. The AI has no goals. Instead, its behavior > > is continually updated by the humans controlling it to meet the > > complex and poorly understood goals of the humans. > > > > -- > > -- Matt Mahoney, [email protected] > > > > > > ------------------------------------------- > > AGI > > Archives: https://www.listbox.com/member/archive/303/=now > > RSS Feed: > https://www.listbox.com/member/archive/rss/303/19999924-4a978ccc > > Modify Your Subscription: https://www.listbox.com/member/?& > > > Powered by Listbox: http://www.listbox.com > *AGI* | Archives <https://www.listbox.com/member/archive/303/=now> > <https://www.listbox.com/member/archive/rss/303/23050605-2da819ff> | > Modify<https://www.listbox.com/member/?&>Your Subscription > <http://www.listbox.com> > ------------------------------------------- AGI Archives: https://www.listbox.com/member/archive/303/=now RSS Feed: https://www.listbox.com/member/archive/rss/303/21088071-f452e424 Modify Your Subscription: https://www.listbox.com/member/?member_id=21088071&id_secret=21088071-58d57657 Powered by Listbox: http://www.listbox.com
