Hello Tri, welcome and thanks for getting in touch, the methods you proposed fit perfectly in the current codebase, so if you are interested in implementing these, please feel free to submit a proposal. About inheritance, in general, we like to avoid inheritance in deference to templates because virtual functions incur runtime overhead, especially in critical inner loops where these functions are called many, many times; this overhead is non-negligible. Of course, there are exceptions for instance, we are currently restructuring the network code to use inheritance instead of boost::variant because it turned out boost::variant introduced some complexity and could be slow.
Let me know if I should clarify anything further. Thanks, Marcus > On 29. Mar 2021, at 03:03, Wahyu Guntara <[email protected]> wrote: > > Hello everyone, > > I am planning on contributing to mlpack under GSOC 2021 (Reinforcement > Learning project ideas > <https://github.com/mlpack/mlpack/wiki/SummerOfCodeIdeas#reinforcement-learning>). > Currently, there is only one implementation of policy gradient methods in > mlpack, namely SAC. PPO method is listed in the project ideas but there's > already a PR on that <https://github.com/mlpack/mlpack/pull/2788>. So, I > would like to propose the implementation of other policy gradient methods as > my GSOC 2021 project. > > There are tons of policy gradient methods > <https://lilianweng.github.io/lil-log/2018/04/08/policy-gradient-algorithms.html>, > but as a starting point, I would like to implement from the basic first. > OpenAI's Spinning Up > <https://spinningup.openai.com/en/latest/user/introduction.html> has starting > code for some policy gradient methods i.e. vanilla policy gradient > (actor-critic), TRPO, PPO, DDPG, TD3, and SAC. Following from this, I wish to > implement the vanilla policy gradient methods (reinforce and actor-critic), > TRPO, and DDPG. What do you think about that as my potential GSOC 2021 > project? > > Besides that, I actually have a question about the mlpack's reinforcement > learning methods. Why does it use template parameters everywhere? Why not use > inheritance? For example, prioritized_replay and random_replay can inherit > from a base_replay_buffer class, q_networks classes can inherit from > base_q_network class. The former case will allow easier replay buffer > customisation (e.g. maybe there are some new prioritization formulas, etc), > while the latter case will avoid confusion on how to use q_learning class > (e.g. confusion like this one <https://github.com/mlpack/mlpack/issues/2849>). > > -- > Best Regards, > Tri Wahyu Guntara > _______________________________________________ > mlpack mailing list > [email protected] > http://knife.lugatgt.org/cgi-bin/mailman/listinfo/mlpack
_______________________________________________ mlpack mailing list [email protected] http://knife.lugatgt.org/cgi-bin/mailman/listinfo/mlpack
