Hello Rajesh, > The implementation of Prioritized action replay is the smallest among the 3 > ideas proposed as the idea is much simpler than the rest. So, Ideally, the > implementation of Double DQN and duelling architectures should take somewhere > between 2-3 months considering all components such as testing etc. And if > there's time left after that the last extension can be added. Since it is a > smaller addition and I would be fully familiar with mlpack by then, I think > adding the last part can be done quickly and can even be done post-summer as I > feel this component is quite useful to any RL library.
This sounds reasonable to me, I think every method you mentioned would fit into the current codebase, so please feel free to choose the methods you find the most interesting. > While going through the code though I noticed something surprising: Sangtong > Zhang has already implemented Dobule DQN. I saw it in this code : > > https://github.com/mlpack/mlpack/blob/master/src/mlpack/methods/reinforcement_le > arning/q_learning_impl.hpp. > > Also, in one of the comments in the PR at > https://github.com/mlpack/mlpack/pull/934 , he mentions testing Double DQN > (comment on 27th May). So I wanted to know if there is something more that is > required to be done as part of double DQN. Ah right, we should close the PR to avoid any more confusions, this was just used to track the overall process. > If double DQN is already done, then I would propose that duelling architecture > and noisy nets can be the main part of the project with prioritised action > replay being the possible extension otherwise the older idea should be an > achievable target. Sounds good, note it's possible to improve/extend the existing Double DQN method. > As you suggested, I went through the code to figure what and all can be > extended > and was very happy to find that the overall code is well structured and hence > can be well exploited for reuse, such as - You are absolutely right, make sure to include that in your proposal. > The timeline is something I feel that can be more flexible based on the > progress. That is, if whatever that has been proposed does get completed > earlier > than expected, then more features can be added (towards having all components > of > Rainbow Algorithm) or if it goes a little slower than expected then I will > ensure that I complete everything that was part of the proposal even post- > summer. Sounds reasonable, we should see if we can define a minimal set of goals, that ideally should be finished by the end of the summer. Also, see https://github.com/mlpack/mlpack/wiki/Google-Summer-of-Code-Application-Guide for some tips. I hope anything I said was helpful, let me know if I should clarify anything. Thanks, Marcus > On 1. Mar 2018, at 13:16, яαנєѕн <[email protected]> wrote: > > Hey Marcus, > > I think each idea you mentioned would fit into the existing codebase, but > don't > underestimate the time you need to implement the method, writing good tests, > etc. Each part is important and takes time, so my recommendation is to focus > on > two ideas and maybe propose to work on another one or extend an idea if there > is > time left. > > I completely agree with this. It will be a lengthy project so I will propose > something on a smaller scale. > I actually was asking more about the fitting into the codebase part for which > I got the answer. Thank you. > > So, I was thinking the following can be done : > > 1. Implementation of Double DQN > > 2. Implementation of Duelling architecture DQN/ Noisy Nets paper - whichever > you think might be better > > 3. Extensions if time permits: Prioritised action replay. The implementation > of Prioritized action replay is the smallest among the 3 ideas proposed as > the idea is much simpler than the rest. So, Ideally, the implementation of > Double DQN and duelling architectures should take somewhere between 2-3 > months considering all components such as testing etc. And if there's time > left after that the last extension can be added. Since it is a smaller > addition and I would be fully familiar with mlpack by then, I think adding > the last part can be done quickly and can even be done post-summer as I feel > this component is quite useful to any RL library. > > While going through the code though I noticed something surprising: Sangtong > Zhang has already implemented Dobule DQN. I saw it in this code : > > https://github.com/mlpack/mlpack/blob/master/src/mlpack/methods/reinforcement_learning/q_learning_impl.hpp > > <https://github.com/mlpack/mlpack/blob/master/src/mlpack/methods/reinforcement_learning/q_learning_impl.hpp>. > > Also, in one of the comments in the PR at > https://github.com/mlpack/mlpack/pull/934 > <https://github.com/mlpack/mlpack/pull/934> , he mentions testing Double DQN > (comment on 27th May). So I wanted to know if there is something more that is > required to be done as part of double DQN. > > If double DQN is already done, then I would propose that duelling > architecture and noisy nets can be the main part of the project with > prioritised action replay being the possible extension otherwise the older > idea should be an achievable target. > > As you suggested, I went through the code to figure what and all can be > extended and was very happy to find that the overall code is well structured > and hence can be well exploited for reuse, such as - > > The policies are separate and hence any change in the way the function > approximator is working will not affect the policy side of it. Hence, > https://github.com/mlpack/mlpack/tree/master/src/mlpack/methods/reinforcement_learning/policy > > <https://github.com/mlpack/mlpack/tree/master/src/mlpack/methods/reinforcement_learning/policy> > can be used as is and can be very useful for testing new methods. > > Same for the environment as well. > https://github.com/mlpack/mlpack/tree/master/src/mlpack/methods/reinforcement_learning/environment > > <https://github.com/mlpack/mlpack/tree/master/src/mlpack/methods/reinforcement_learning/environment> > can be used as is. > > The replay, > https://github.com/mlpack/mlpack/blob/master/src/mlpack/methods/reinforcement_learning/replay/random_replay.hpp > > <https://github.com/mlpack/mlpack/blob/master/src/mlpack/methods/reinforcement_learning/replay/random_replay.hpp> > , is something that will be extended in the prioritized action replay > method as the algorithm modifies that part of the algorithm. It will remain > the same in all the other parts of the implementation. > > We can Reuse most of what's in > https://github.com/mlpack/mlpack/blob/master/src/mlpack/methods/reinforcement_learning/q_learning_impl.hpp > > <https://github.com/mlpack/mlpack/blob/master/src/mlpack/methods/reinforcement_learning/q_learning_impl.hpp> > and > https://github.com/mlpack/mlpack/blob/master/src/mlpack/methods/reinforcement_learning/q_learning.hpp > > <https://github.com/mlpack/mlpack/blob/master/src/mlpack/methods/reinforcement_learning/q_learning.hpp> > but the network type will be different for both Duelling architecture and > noisy nets. But the other parts can be extended. > > The timeline is something I feel that can be more flexible based on the > progress. That is, if whatever that has been proposed does get completed > earlier than expected, then more features can be added (towards having all > components of Rainbow Algorithm) or if it goes a little slower than expected > then I will ensure that I complete everything that was part of the proposal > even post-summer. > > So, I would like to know what more is required as part of the proposal and > also if Double DQN was fully implemented or not. > > Regards, > Rajesh D M > > > > On Tue, Feb 27, 2018 at 3:27 AM, Marcus Edel <[email protected] > <mailto:[email protected]>> wrote: > Hello Rajesh, > >> As you mentioned, I've been working on the new environment (Gridworld from >> Sutton and Barto - it's a simple environment) for testing out.I think it is >> ready but want to test it in the standard way. So could you please tell me >> how >> exactly were the environments cartpole and mountain car tested/run in >> general so >> that I can follow a similar procedure to see whatever I have done is correct >> or >> not. > > That sounds great, > https://github.com/mlpack/mlpack/blob/master/src/mlpack/tests > <https://github.com/mlpack/mlpack/blob/master/src/mlpack/tests> > /rl_components_test.cpp should be helpful. > >> So, I think mlpack should have this latest state of the art available as >> part of >> the library. It may not be possible to implement all of the above mentioned >> techniques in 3 months but I feel they are not very hard to add either as >> they >> are just extensions on top of each other (for most parts) and I would be >> also be >> happy to continue contributing after the GSoC as well. >> >> So, can we work towards Rainbow as the goal for GSoC (with few but not all >> components). Will that be a good idea ? > > Sounds like you already put some time into the project idea, that is great. I > think each idea you mentioned would fit into the existing codebase, but don't > underestimate the time you need to implement the method, writing good tests, > etc. Each part is important and takes time, so my recommendation is to focus > on > two ideas and maybe propose to work on another one or extend an idea if there > is > time left. Also, another tip for the proposal is to mention the parts that can > be reused or have to be extended over the summer, a clear structure of the > project idea helps a lot. > > I hope anything I said was helpful, let me know if I should clarify anything. > > Thanks, > Marcus > >> On 26. Feb 2018, at 19:23, яαנєѕн <[email protected] >> <mailto:[email protected]>> wrote: >> >> Hey Marcus, Rajesh here. >> >> As you mentioned, I've been working on the new environment (Gridworld from >> Sutton and Barto - it's a simple environment) for testing out.I think it is >> ready but want to test it in the standard way. So could you please tell me >> how exactly were the environments cartpole and mountain car tested/run in >> general so that I can follow a similar procedure to see whatever I have done >> is correct or not. >> >> Also, with this I have gotten a good idea of how mlpack works and getting >> more and more used to it by the day. I also wanted to parallelly start >> working on the proposal. >> >> I went through everything Shangtong Zhang had done last year as part of GSoC >> and learnt that DQN and async n-step q-learning are the major contributions >> with rest of his work revolving around them. >> >> So I think the following can be extensions to his work which would fit well >> into existing architecture built by him : >> >> 1. Double DQN (as suggested by you guys in the ideas list) >> >> 2. Prioritized action replay : In this method, the samples are no longer >> selected at random for the replay buffer as they are in DQN method but are >> prioritized based on a parameter. One of the parameters is the TD-error. >> This method's results beat the results of Double DQN >> >> 3. After this, Deep mind released their next improvisation : Dueling >> architecture : >> >> In this architecture, the state values and the actions values from the >> state action function are separated in the neural net architecture and >> combined back before the last step. The intuition behind this is t hat the >> value of a state does not always depend only on the actions that can be >> taken from that state. >> >> 4. They then came up with Noisy Nets: Another improvement while using all >> the above methods by adding noise to the neural net weights which in turn >> according to them improved the overall exploration efficiency. >> >> They also had other improvements in Multi Step RL and Distributional RL. >> >> After this is when they came up with their best algorithm: >> >> Rainbow : It is a combination of all the above mentioned algorithms. They >> were able to combine all the algorithms as they all work on different parts >> of the learning of the RL agent (exploration, policy update etc). The >> results of Rainbow far exceed the results of any of the other techniques out >> there. The paper also shows results of other combinations of the above >> mentioned methods. >> <http://www.mlpack.org/gsocblog/ShangtongZhangPage.html> >> So, I think mlpack should have this latest state of the art available as >> part of the library. It may not be possible to implement all of the above >> mentioned techniques in 3 months but I feel they are not very hard to add >> either as they are just extensions on top of each other (for most parts) and >> I would be also be happy to continue contributing after the GSoC as well. >> >> So, can we work towards Rainbow as the goal for GSoC (with few but not all >> components). Will that be a good idea ? >> >> I have already read all papers as part of my thesis work and actually >> working towards improving upon them and hence have a thorough understanding >> of all the concepts so I can start working on them at the earliest. >> >> PS: The other implementation of Proximal Policy Optimization Algorithms(PPO) >> is actually an improvement over Trust Region Policy Optimization (TRPO) so >> to implement PPO, TRPO might have to be implemented first. Also, that is in >> the domain of continuous action space and continuous state space (Rainbow >> and other techniques are can handle only continuous state space) and the >> other state of the art in that area is Deep Deterministic Policy >> Gradient(DDPG) . So if you want that to be part of mlpack, it'll probably be >> a good idea to implement those 3 together. I am equally interested in both >> sets of implementation (Have gone through all 3 of these papers also >> already) . >> >> I personally feel going with the first set is better as Shangtong Zhang has >> created a great base for build up of new methods on top of it. Pleas let me >> know what you think about the same. >> >> -- >> Regards, >> Rajesh D M >> <Distributional RL.pdf><Dueling Network Architectures for DeepRL.pdf><Noisy >> Networks for exploration >> RL.pdf><Prioritized_experience_replay.pdf><TrustRegionPolicyOptimisation.pdf> > > > > > -- > Regards, > Rajesh D M
_______________________________________________ mlpack mailing list [email protected] http://knife.lugatgt.org/cgi-bin/mailman/listinfo/mlpack
