On 28/02/2014 11:06, Ben Goertzel wrote:
See
http://hplusmagazine.com/2014/02/28/saving-the-world-with-analytical-philosophy/
;-)
There, Matt writes:
"Reinforcement learning is slow because a reward signal transmits fewer
bits to a complex system than updating the code or giving explicit directions.
And self-modification would add no bits at all. MIRI needs to explain why this
will change in the future."
This seems like a debate about what "reinforcement learning" means. Typically,
a reward signal
is not the only information that a "reinforcement learning agent" gains from
its environment
while it is learning. Such agents typically also have a conventional sensory
array - from which
they can receive "explicit directions" and other forms of input. If an agent
has previously
learned that "explicit directions" offer a useful shortcut to receiving
rewards, then it will
pay a good deal of attention to them.
If you look on Wikipedia, you will probably see things like: "Reinforcement
learning differs
from standard supervised learning
<http://en.wikipedia.org/wiki/Supervised_learning> in that correct input/output
pairs are never presented,
nor sub-optimal actions explicitly corrected. "
However, that sort of distinction is (arguably) not a very useful one for
machine
intelligence enthusiasts. In practice learning agents typically learn that
supervisors
are worth following and corrections are worth heeding *via* reinforcement
learning.
Basically, it isn't true that a "reinforcement learning agent" can't use
supervised
learning, or make use of correction. Nor is it true that it has an impoverished
rate
of learning through its limited scalar reward signal. This whole business
seems like
a muddle derived from attempting to distinguish the various different types of
learning.
Reinforcement learning isn't best seen as an alternative to supervised learning
-
rather reinforcement underpins *all* types of learning. It's the "universal
currency"
of learning - in the same way that fitness is the "universal currency" of
evolution,
money is the "universal currency" of economics or utility is the "universal
currency"
of utilitarians.
Nor does it make very much theoretical difference whether the "goodness scalar"
involved in the "universal currency" comes from the environment directly, or is
synthesized internally from sensory data and current state using a "utility
function".
Alas, an awful lot of hot air seems to surround this last point.
--
__________
|im |yler http://timtyler.org/ [email protected] Remove lock to reply.
-------------------------------------------
AGI
Archives: https://www.listbox.com/member/archive/303/=now
RSS Feed: https://www.listbox.com/member/archive/rss/303/21088071-f452e424
Modify Your Subscription:
https://www.listbox.com/member/?member_id=21088071&id_secret=21088071-58d57657
Powered by Listbox: http://www.listbox.com