Hello Eshaan,

Thanks for the introduction and the interest in the project. PPO, TD3, ACKTR,
HER, Improving Rainbow implementation are all interesting methods providing a
good baseline. My suggestion would be to pick one or two; I don't think it's
feasible to implement everything over the summer, especially if we want to
implement proper tests and a dedicated tutorial; those things often take more
time than anticipated. You are right about the existing Rainbow features; there
is no need to mention it on the GSoC idea page anymore. I'll go ahead and update
the section.

I hope anything I said was helpful. Let me know if there is anything I should
clarify.

Thanks
Marcus

> On Mar 30, 2022, at 6:42 PM, Eshaan Agarwal <[email protected]> wrote:
> 
> Hello everyone,
> 
> I'm Eshaan, 2nd year student at IIT(BHU), Varanasi, India. I would like to 
> spend the coming summer working with the mlpack library under GSoC.
> 
> I have been working with mlpack for quite a while, and have been 
> familiarizing myself with the RL codebase. I want to propose a potential idea 
> for a large project (~350 hours) and get the community's feedback to 
> strengthen my proposal. 
> As per my knowledge, there have been various attempts of adding algorithms 
> like DDPG at https://github.com/mlpack/mlpack/pull/2912 
> <https://github.com/mlpack/mlpack/pull/2912>, PPO at 
> https://github.com/mlpack/mlpack/pull/2788 
> <https://github.com/mlpack/mlpack/pull/2788> and 
> https://github.com/mlpack/mlpack/pull/1912 
> <https://github.com/mlpack/mlpack/pull/1912>.
> 
> So, I would like to extend the library, by adding the implementation of some 
> popular algorithms, along with proper tests and documentation, and dedicated 
> tutorial. I have the following in mind:
> 
>   1) PPO - PPO is one of the most sought-out algorithms which has not been 
> implemented yet. More specifically, I intend to implement the Clipped version 
> of PPO.
>   2) Twin Delayed DDPG(TD3) : While DDPG can achieve great performance, it is 
> brittle to hyperparameters and other kinds of tuning. TD3 comes with 3 major 
> improvements to counter this. 
>   3) ACKTR : 
>   4) Hindsight Experience Replay (HER) - Particularly helpful in multi-task 
> and sparse reward situations which is often encountered in practical 
> scenarios like Robotics etc; It can be also added as a component in   DQN, 
> QR-DQN, SAC, TQC, TD3, or DDPG etc;
>  5) Revisiting and Improving Rainbow - 
> Implement various flavours of DQN like - QR-DQN, IDQN and Modified Rainbow as 
> per https://arxiv.org/abs/2011.14826 <https://arxiv.org/abs/2011.14826> 
> Benchmarking of DQN, Rainbow and other flavors amongst themselves.
> Benchmark our implemented algorithms against other existing version like 
> OpenAi’s Baselines, Google’s Dopamine etc;
> 
> Besides that, I actually have a question - I noticed that all components of 
> Rainbow are present in the library but I am not sure why it remains a 
> subtopic in the Reinforcement Learning Section of GSOC Ideas. Is there 
> anything left in Rainbow ?
> 
> Which among these should I proceed with, for making a proposal? Also, do 
> suggest any other algorithms that you might be thinking of. What should be 
> the ideal number of deliverables sufficient for a large sized project on this 
> topic? Please let me know your thoughts.
> 
> Looking forward to hearing back from the community :)
> 
> 
> Thanks for reading!
> 
> 
> 
> 
> 
> 
> _______________________________________________
> mlpack mailing list
> [email protected]
> http://knife.lugatgt.org/cgi-bin/mailman/listinfo/mlpack

_______________________________________________
mlpack mailing list
[email protected]
http://knife.lugatgt.org/cgi-bin/mailman/listinfo/mlpack

Reply via email to