[nupic-dev] Emotion as a supervisor

Chetan Surpur Fri, 02 Aug 2013 00:02:46 -0700

An interesting thought occurred to me today, and though I'm sure most of
you have already considered it, I wanted to pose my speculation here and
hear your opinions.


My proposition is that emotions exist to act as a supervisor, like time, to
speed up learning.

For instance, consider a game of chess. We could create a chess-playing AI
based on the CLA that watches games of chess and implicitly learns the
rules by building a model of how board states transition into one another.
If we wanted it to play well, we could have it watch games played between
expert chess players, either human grandmasters or graph search-based AIs,
and learn the temporal patterns of "optimal" play that lead towards
winning. Then, it could play a game of chess by being fed a game state, and
predicting the next move in line with the patterns it's previously learned.
This is pretty much how a human (without supervision) would learn to play
chess as well.

But the funny thing is that though this AI could learn to play chess well,
simply by learning and predicting the patterns of other master players,
it's doing so without even knowing what "winning" means. It's not playing
to win at all, because we never even told it when it's won. Though it's
learning in a human-like fashion, it's not actually driven by any
particular goal. The consequence of this is that it would only be able to
emulate the patterns that it's trained on, no matter how good or bad they
are, but never be able to learn to win of its own volition.

On the other hand, a human can play better chess by practicing against
herself, because she's moving to win. What drives her towards this goal?
Well, when she reaches this goal, she feels happy. So she wants to learn
the moves that result in her happiness, tangential to the external training
data she receives.

If we wanted to equip our AI with the ability to play against itself and
learn, we could use a mechanism inspired by emotion to give it positive
reinforcement when it wins. During the course of the game, we keep track of
all the predictions it made (which in turn translate to its moves), and if
it wins the game, we reward it with "happiness" by reinforcing all the
neural connections that led to those predictions. With this mechanism, over
time our AI would converge towards learning the connections that lead to
winning, without needing an expert player to watch.

This goal-oriented system would be useful in many contexts, and maybe it
could be a part of the NuPIC system. It could start with an ability to
reinforce the network's last N predictions when it they lead to something
desirable, speeding up learning towards the goal.

I'd love to hear your thoughts. Sorry this email is so long; I wrote it out
to solidify these ideas as much for myself as for this mailing list. Hope
it makes sense though!

Thanks,
Chetan

_______________________________________________
nupic mailing list
[email protected]
http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org

[nupic-dev] Emotion as a supervisor

Reply via email to