Hello everyone, I'm Eshaan, 2nd year student at IIT(BHU), Varanasi, India. I would like to spend the coming summer working with the mlpack library under GSoC.
I have been working with mlpack for quite a while, and have been familiarizing myself with the RL codebase. I want to propose a potential idea for a large project (~350 hours) and get the community's feedback to strengthen my proposal. As per my knowledge, there have been various attempts of adding algorithms like DDPG at https://github.com/mlpack/mlpack/pull/2912, PPO at https://github.com/mlpack/mlpack/pull/2788 and https://github.com/mlpack/mlpack/pull/1912. So, I would like to extend the library, by adding the implementation of some popular algorithms, along with proper tests and documentation, and dedicated tutorial. I have the following in mind: 1) PPO - PPO is one of the most sought-out algorithms which has not been implemented yet. More specifically, I intend to implement the Clipped version of PPO. 2) Twin Delayed DDPG(TD3) : While DDPG can achieve great performance, it is brittle to hyperparameters and other kinds of tuning. TD3 comes with 3 major improvements to counter this. 3) ACKTR : 4) Hindsight Experience Replay (HER) - Particularly helpful in multi-task and sparse reward situations which is often encountered in practical scenarios like Robotics etc; It can be also added as a component in DQN, QR-DQN, SAC, TQC, TD3, or DDPG etc; 5) Revisiting and Improving Rainbow - - Implement various flavours of DQN like - QR-DQN, IDQN and Modified Rainbow as per https://arxiv.org/abs/2011.14826 - Benchmarking of DQN, Rainbow and other flavors amongst themselves. - Benchmark our implemented algorithms against other existing version like OpenAi’s Baselines, Google’s Dopamine etc; Besides that, I actually have a question - I noticed that all components of Rainbow are present in the library but I am not sure why it remains a subtopic in the Reinforcement Learning Section of GSOC Ideas. Is there anything left in Rainbow ? Which among these should I proceed with, for making a proposal? Also, do suggest any other algorithms that you might be thinking of. What should be the ideal number of deliverables sufficient for a large sized project on this topic? Please let me know your thoughts. Looking forward to hearing back from the community :) Thanks for reading!
_______________________________________________ mlpack mailing list [email protected] http://knife.lugatgt.org/cgi-bin/mailman/listinfo/mlpack
