[mlpack] [gsoc-2019] Interested in add RL algorithm to codebase

Kaiqiang Xu Sat, 06 Apr 2019 08:51:39 -0700

Hello, Mentors,
        I am second-year postgraduate student majoring in CS. I have commit 
some PR in mlpack, especially in Decision Tree / decision stump. These days I 
am glad to see that implementing SOTA RL algorithm is arranged as a project in 
GSoC2019. I am very interested in contribute code about RL algorithm to mlpack. 
I am familiar with Actor-Critic algorithm series such as A3C, DDPG, PPO and 
etc. Of course If I want to implement them, I need to dive into them.
        Here is my thought on this subject. Firstly I will build a Actor-Critic 
algorithm framework in order to integrate various algorithm into it 
conveniently later. Then I will implement ACKTR and TRPO/PPO. Considering Month 
May, so here are around 4 months to finish this task. My proposal is as 
following:


0. week -6 ~ -4 (in April), I am familiar with the framework and logic of 
MLPack. However I also should cost time to read the code about the 
implementation in method/reinforcement_learning, and the zoq/gym_tcp_api 
<https://github.com/zoq/gym_tcp_api>. I think it it better to understand the 
system design done by pioneer. 
1. week -3 ~ 0 (in May), review these method and refer other implementations 
based on Pytorch to summary the framework of the A-C algorithm. A comprehensive 
report should be presented. Besides, a manuscript about a scalable 
reinforcement learning module design should present.
2. week 1 ~ 4 (Project Beginning), finish the Actor-Critic framework, make it 
compatible with existing code, or refactor existing code for scalable 
considering future.
3. week 5 ~ 8, finish the PPO and ACKTR. It is SOTA method. Some simple tests 
should be done to guarantee code correction.
4. week 9~12, a comprehensive unit test should be done and corresponding 
documents and tutorials should be posted. The result should be comparable to 
other implementation based on Pytorch. And documents about API should be 
clarify. A tutorial containing an RL example or demo also should be posted.

How do you think of it? Can you give me some suggestions?

_______________________________________________
mlpack mailing list
[email protected]
http://knife.lugatgt.org/cgi-bin/mailman/listinfo/mlpack

[mlpack] [gsoc-2019] Interested in add RL algorithm to codebase

Reply via email to