Hello everyone, I am planning on contributing to mlpack under GSOC 2021 (Reinforcement Learning project ideas <https://github.com/mlpack/mlpack/wiki/SummerOfCodeIdeas#reinforcement-learning>). Currently, there is only one implementation of policy gradient methods in mlpack, namely SAC. PPO method is listed in the project ideas but there's already a PR on that <https://github.com/mlpack/mlpack/pull/2788>. So, I would like to propose the implementation of other policy gradient methods as my GSOC 2021 project.
There are tons of policy gradient methods <https://lilianweng.github.io/lil-log/2018/04/08/policy-gradient-algorithms.html>, but as a starting point, I would like to implement from the basic first. OpenAI's Spinning Up <https://spinningup.openai.com/en/latest/user/introduction.html> has starting code for some policy gradient methods i.e. vanilla policy gradient (actor-critic), TRPO, PPO, DDPG, TD3, and SAC. Following from this, I wish to implement the vanilla policy gradient methods (reinforce and actor-critic), TRPO, and DDPG. What do you think about that as my potential GSOC 2021 project? Besides that, I actually have a question about the mlpack's reinforcement learning methods. Why does it use template parameters everywhere? Why not use inheritance? For example, *prioritized_replay* and *random_replay* can inherit from a *base_replay_buffer* class, *q_networks* classes can inherit from *base_q_network* class. The former case will allow easier replay buffer customisation (e.g. maybe there are some new prioritization formulas, etc), while the latter case will avoid confusion on how to use *q_learning* class (e.g. confusion like this one <https://github.com/mlpack/mlpack/issues/2849>). -- Best Regards, Tri Wahyu Guntara
_______________________________________________ mlpack mailing list [email protected] http://knife.lugatgt.org/cgi-bin/mailman/listinfo/mlpack
