On 01/03/2014 07:40, Tim Tyler wrote:
Nor does it make very much theoretical difference whether the "goodness scalar" involved in the "universal currency" comes from the environment directly, or is synthesized internally from sensory data and current state using a "utility function". Alas, an awful lot of hot air seems to surround this last point.
To illustrate, here's Ben from 2009: http://multiverseaccordingtoben.blogspot.com/2009/05/reinforcement-learning-some-limitations.html Part of the problem is terminology. However, it is very useful to have a general theory of learning based on reward, utility - or whatever you want to call the "goodness" metric. I feel frustrated with the critics; they don't seem to get it. -- __________ |im |yler http://timtyler.org/ [email protected] Remove lock to reply. ------------------------------------------- AGI Archives: https://www.listbox.com/member/archive/303/=now RSS Feed: https://www.listbox.com/member/archive/rss/303/21088071-f452e424 Modify Your Subscription: https://www.listbox.com/member/?member_id=21088071&id_secret=21088071-58d57657 Powered by Listbox: http://www.listbox.com
