Re: [agi] "Reward" and "utility" are fundamentally the same

Aaron Hosford Mon, 03 Mar 2014 07:35:05 -0800

I'm in full agreement: Reinforcement learning is a generalization, making
supervision *optional*.


For both reinforcement learning and self-modification, Matt is overlooking
the information available directly from the environment. It is this
information that makes learning algorithms in general useful. As a specific
example, one of the reasons Deep Learning has picked up so much lately is
that you can train your system on unlabeled data, using relatively cheap
and abundant CPU cycles rather than investing enormous amounts of costly
and scarce human labor.


On Sat, Mar 1, 2014 at 6:40 AM, Tim Tyler <[email protected]> wrote:

>  On 28/02/2014 11:06, Ben Goertzel wrote:
>
>  See
> http://hplusmagazine.com/2014/02/28/saving-the-world-with-analytical-philosophy/
>
> ;-)
>
>
> There, Matt writes:
>
> "Reinforcement learning is slow because a reward signal transmits fewer
> bits to a complex system than updating the code or giving explicit
> directions.
> And self-modification would add no bits at all. MIRI needs to explain why
> this
> will change in the future."
>
> This seems like a debate about what "reinforcement learning" means.
> Typically, a reward signal
> is not the only information that a "reinforcement learning agent" gains
> from its environment
> while it is learning. Such agents typically also have a conventional
> sensory array - from which
> they can receive "explicit directions" and other forms of input. If an
> agent has previously
> learned that "explicit directions" offer a useful shortcut to receiving
> rewards, then it will
> pay a good deal of attention to them.
>
> If you look on Wikipedia, you will probably see things like:
> "Reinforcement learning differs
> from standard supervised 
> learning<http://en.wikipedia.org/wiki/Supervised_learning>in that correct 
> input/output pairs are never presented,
> nor sub-optimal actions explicitly corrected. "
>
> However, that sort of distinction is (arguably) not a very useful one for
> machine
> intelligence enthusiasts. In practice learning agents typically learn that
> supervisors
> are worth following and corrections are worth heeding *via* reinforcement
> learning.
>
> Basically, it isn't true that a "reinforcement learning agent" can't use
> supervised
> learning, or make use of correction. Nor is it true that it has an
> impoverished rate
> of learning through its limited scalar reward signal.  This whole business
> seems like
> a muddle derived from attempting to distinguish the various different
> types of learning.
> Reinforcement learning isn't best seen as an alternative to supervised
> learning -
> rather reinforcement underpins *all* types of learning.  It's the
> "universal currency"
> of learning - in the same way that fitness is the "universal currency" of
> evolution,
> money is the "universal currency" of economics or utility is the
> "universal currency"
> of utilitarians.
>
> Nor does it make very much theoretical difference whether the "goodness
> scalar"
> involved in the "universal currency" comes from the environment directly,
> or is
> synthesized internally from sensory data and current state using a
> "utility function".
> Alas, an awful lot of hot air seems to surround this last point.
> --
> __________
>  |im |yler  http://timtyler.org/  [email protected]  Remove lock to reply.
>
>    *AGI* | Archives <https://www.listbox.com/member/archive/303/=now>
> <https://www.listbox.com/member/archive/rss/303/23050605-2da819ff> |
> Modify<https://www.listbox.com/member/?&;>Your Subscription
> <http://www.listbox.com>
>



-------------------------------------------
AGI
Archives: https://www.listbox.com/member/archive/303/=now
RSS Feed: https://www.listbox.com/member/archive/rss/303/21088071-f452e424
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=21088071&id_secret=21088071-58d57657
Powered by Listbox: http://www.listbox.com

Re: [agi] "Reward" and "utility" are fundamentally the same

Reply via email to