Hi All, I'm Attila Sulyok, second year MSc Computer Engineer student at the PPCU in Budapest. I am also interested in participating the Google Summer of Code this year, specifically developing the reinforcement learning modules of mlpack.
One of my ideas is implementing the modification to the DQN algorithm described here: [1] that uses (discretised) value distributions instead of value functions. The trivial approach would be to implement it as a separate algorithm (like QLearning) or modify the existing one, but I think it's more general than that: it should be possible to use it with all value-function-based algorithms. One idea is to hack it into a layer (not sure if possible), the other is to extract the Q update part of the code into a parameter, sort of like a loss function. As I understand, the current state of the art algorithm for learning continuous actions using value-functions is NAF [2], this may also benefit from value distributions. The third idea that I found is Hindsight Experience Replay [3], that wraps a learning algorithm like DQN or NAF and creates additional goals to learn. Would mlpack benefit from implementing these? Since the reinforcement learning part is not large, they shouldn't require large modifications to existing code. I built the code and tested with some small algorithms; and one thing I noticed (having only used keras-rl before) is the lack of metrics output from training. Is that intentional? I've never used RL in the industry, only for research (in my current thesis project), so I'm not quite sure whether it would be useful. Same thing with the current state of the RL agent not being visible. Thanks, Attila [1]: https://arxiv.org/pdf/1707.06887.pdf [2]: https://arxiv.org/abs/1603.00748.pdf [3]: https://arxiv.org/pdf/1707.01495.pdf
_______________________________________________ mlpack mailing list [email protected] http://knife.lugatgt.org/cgi-bin/mailman/listinfo/mlpack
