An interesting thought occurred to me today, and though I'm sure most of you have already considered it, I wanted to pose my speculation here and hear your opinions.
My proposition is that emotions exist to act as a supervisor, like time, to speed up learning. For instance, consider a game of chess. We could create a chess-playing AI based on the CLA that watches games of chess and implicitly learns the rules by building a model of how board states transition into one another. If we wanted it to play well, we could have it watch games played between expert chess players, either human grandmasters or graph search-based AIs, and learn the temporal patterns of "optimal" play that lead towards winning. Then, it could play a game of chess by being fed a game state, and predicting the next move in line with the patterns it's previously learned. This is pretty much how a human (without supervision) would learn to play chess as well. But the funny thing is that though this AI could learn to play chess well, simply by learning and predicting the patterns of other master players, it's doing so without even knowing what "winning" means. It's not playing to win at all, because we never even told it when it's won. Though it's learning in a human-like fashion, it's not actually driven by any particular goal. The consequence of this is that it would only be able to emulate the patterns that it's trained on, no matter how good or bad they are, but never be able to learn to win of its own volition. On the other hand, a human can play better chess by practicing against herself, because she's moving to win. What drives her towards this goal? Well, when she reaches this goal, she feels happy. So she wants to learn the moves that result in her happiness, tangential to the external training data she receives. If we wanted to equip our AI with the ability to play against itself and learn, we could use a mechanism inspired by emotion to give it positive reinforcement when it wins. During the course of the game, we keep track of all the predictions it made (which in turn translate to its moves), and if it wins the game, we reward it with "happiness" by reinforcing all the neural connections that led to those predictions. With this mechanism, over time our AI would converge towards learning the connections that lead to winning, without needing an expert player to watch. This goal-oriented system would be useful in many contexts, and maybe it could be a part of the NuPIC system. It could start with an ability to reinforce the network's last N predictions when it they lead to something desirable, speeding up learning towards the goal. I'd love to hear your thoughts. Sorry this email is so long; I wrote it out to solidify these ideas as much for myself as for this mailing list. Hope it makes sense though! Thanks, Chetan
_______________________________________________ nupic mailing list [email protected] http://lists.numenta.org/mailman/listinfo/nupic_lists.numenta.org
