Hello Eshaan, Thanks for the introduction and the interest in the project. PPO, TD3, ACKTR, HER, Improving Rainbow implementation are all interesting methods providing a good baseline. My suggestion would be to pick one or two; I don't think it's feasible to implement everything over the summer, especially if we want to implement proper tests and a dedicated tutorial; those things often take more time than anticipated. You are right about the existing Rainbow features; there is no need to mention it on the GSoC idea page anymore. I'll go ahead and update the section.
I hope anything I said was helpful. Let me know if there is anything I should clarify. Thanks Marcus > On Mar 30, 2022, at 6:42 PM, Eshaan Agarwal <[email protected]> wrote: > > Hello everyone, > > I'm Eshaan, 2nd year student at IIT(BHU), Varanasi, India. I would like to > spend the coming summer working with the mlpack library under GSoC. > > I have been working with mlpack for quite a while, and have been > familiarizing myself with the RL codebase. I want to propose a potential idea > for a large project (~350 hours) and get the community's feedback to > strengthen my proposal. > As per my knowledge, there have been various attempts of adding algorithms > like DDPG at https://github.com/mlpack/mlpack/pull/2912 > <https://github.com/mlpack/mlpack/pull/2912>, PPO at > https://github.com/mlpack/mlpack/pull/2788 > <https://github.com/mlpack/mlpack/pull/2788> and > https://github.com/mlpack/mlpack/pull/1912 > <https://github.com/mlpack/mlpack/pull/1912>. > > So, I would like to extend the library, by adding the implementation of some > popular algorithms, along with proper tests and documentation, and dedicated > tutorial. I have the following in mind: > > 1) PPO - PPO is one of the most sought-out algorithms which has not been > implemented yet. More specifically, I intend to implement the Clipped version > of PPO. > 2) Twin Delayed DDPG(TD3) : While DDPG can achieve great performance, it is > brittle to hyperparameters and other kinds of tuning. TD3 comes with 3 major > improvements to counter this. > 3) ACKTR : > 4) Hindsight Experience Replay (HER) - Particularly helpful in multi-task > and sparse reward situations which is often encountered in practical > scenarios like Robotics etc; It can be also added as a component in DQN, > QR-DQN, SAC, TQC, TD3, or DDPG etc; > 5) Revisiting and Improving Rainbow - > Implement various flavours of DQN like - QR-DQN, IDQN and Modified Rainbow as > per https://arxiv.org/abs/2011.14826 <https://arxiv.org/abs/2011.14826> > Benchmarking of DQN, Rainbow and other flavors amongst themselves. > Benchmark our implemented algorithms against other existing version like > OpenAi’s Baselines, Google’s Dopamine etc; > > Besides that, I actually have a question - I noticed that all components of > Rainbow are present in the library but I am not sure why it remains a > subtopic in the Reinforcement Learning Section of GSOC Ideas. Is there > anything left in Rainbow ? > > Which among these should I proceed with, for making a proposal? Also, do > suggest any other algorithms that you might be thinking of. What should be > the ideal number of deliverables sufficient for a large sized project on this > topic? Please let me know your thoughts. > > Looking forward to hearing back from the community :) > > > Thanks for reading! > > > > > > > _______________________________________________ > mlpack mailing list > [email protected] > http://knife.lugatgt.org/cgi-bin/mailman/listinfo/mlpack
_______________________________________________ mlpack mailing list [email protected] http://knife.lugatgt.org/cgi-bin/mailman/listinfo/mlpack
