Hello mentors! I am Nishant, a sophomore at IIT(BHU), Varanasi, India. I would like to spend my time this summer, working with the Mlpack library under GSoC.
I have been working with mlpack for quite a while, and have been familiarizing myself with the RL codebase. After exploring and going through the tests, I felt that a lot of state of the art algorithms implementations are still missing. From what I could infer, presently, we only have the implementation of DQN(with DoubleDQN), Multistep DQN and async multistep DQNs and sarsa in the codebase. Also, I guess PR#1912 (PPO) is in the process of getting merged. So, I would like to extend the library, by adding the implementation of some relatively recent model-free learning algorithms, along with proper tests and documentation, and dedicated tutorial, if time permits. I have the following in mind: 1) *Soft Actor-Critic and A2C/A3C*: the two of them are quite versatile and most people would want to use it. SAC is a relatively new idea. 2) *Twin Delayed DDPG*: It's new and a stable version of DDPG. 3) *ACKTR and Hindsight Experience Replay(HER) support for DQNs*: these are also recent ideas, although I am not sure of their practical use cases. 4) *Rainbow DQN*: three of the six extensions are already added, so I think adding the remaining would probably not take much time. So implementing Rainbow with one of the above algorithms should be enough for the 12-week program. Which among these should I proceed with, for making a proposal? Kindly let me know your thoughts on them. Also, do suggest any other algorithms that you might be thinking of. I would also like to know if anyone else has been working on the same ideas, in order to avoid redundancies. Please let me know your thoughts. Thanks for reading!
_______________________________________________ mlpack mailing list [email protected] http://knife.lugatgt.org/cgi-bin/mailman/listinfo/mlpack
