Re: [mlpack] Reinforcement Learning GSOC

Marcus Edel Sun, 11 Mar 2018 10:25:31 -0700

Hello Sahith,

> Apologies for the long delay in my reply. I had my midsem examinations going 
> on
> and was unable to respond.


No worries, I hope everything went well.

> The time metric I had in mind was more related to how long the actual in game
> time is for which I think is independent of the system and is part of the
> environment itself. However I realized that most games already have a score 
> that
> focuses on time so this might seem redundant.

I see we could definitely introduce a metric that is related, e.g. that counts
the number of evaluations/iterations.

> In one of your previous mails you mentioned we should initially focus on
> existing mlpack methods for the training. The only mlpack RL method currently
> present is a Q-Learning model from last year's GSOC which includes policies 
> and
> also experience replays. While this is good for the basic environments in 
> OpenAI
> we should implement at least one more method to supplement it.

Sure, if you like to add another method, please feel free, I guess we could also
see if we can use a recurrent network, or CMAES, CNE, to solve a given task.

> 1. Double- DQN could be a good fit as it just builds on top of the current
> method and hence would be the best to pursue 2. An advanced Policy Gradient
> method which would take more time but could also extend the number of
> environments that can be solved in the future.

I like both ideas, we should just make sure it is manageable, as you already
pointed out the Advanced Policy Gradient method might take more time.

> Also in regards building an API I would like to know whether you wanted to 
> focus
> on building on top of the methods already present in mlpack and extend them as
> much as we can or build something from scratch but using the mlpack methods
> present whenever we need them.

Are you talking about the additional RL method?

Let me know if I should clarify anything.

Thanks,
Marcus

> On 11. Mar 2018, at 04:54, Sahith D <[email protected]> wrote:
> 
> Hello Marcus,
> Apologies for the long delay in my reply. I had my midsem examinations going 
> on and was unable to respond.
> 
> The time metric I had in mind was more related to how long the actual in game 
> time is for which I think is independent of the system and is part of the 
> environment itself. However I realized that most games already have a score 
> that focuses on time so this might seem redundant.
> 
> In one of your previous mails you mentioned we should initially focus on 
> existing mlpack methods for the training. The only mlpack RL method currently 
> present is a Q-Learning model from last year's GSOC which includes policies 
> and also experience replays. While this is good for the basic environments in 
> OpenAI we should implement at least one more method to supplement it.
> 
> 1. Double- DQN could be a good fit as it just builds on top of the current 
> method and hence would be the best to pursue
> 2. An advanced Policy Gradient method which would take more time but could 
> also extend the number of environments that can be solved in the future.
> 
> Also in regards building an API I would like to know whether you wanted to 
> focus on building on top of the methods already present in mlpack and extend 
> them as much as we can or build something from scratch but using the mlpack 
> methods present whenever we need them.
> 
> Thanks
> 
> 
> 
> On Sat, Mar 3, 2018 at 5:39 PM Marcus Edel <[email protected] 
> <mailto:[email protected]>> wrote:
> Hello Sahith,
> 
> I'm not sure about the time metric, it might be meaningless if not run on the
> same or similar system. If we only compare our own methods, that should be 
> fine
> through. The rest sounds reasonable to me.
> 
> Best,
> Marcus
> 
>> On 2. Mar 2018, at 22:34, Sahith D <[email protected] 
>> <mailto:[email protected]>> wrote:
>> 
>> Hi Marcus,
>> 
>> Making pre-trained models sounds good however we'll have to pick the most 
>> popular or easiest environments for this at least in the start. 
>> For meaningful metrics other than iterations we could have use the score of 
>> the game which is the best possible metric and also the time it takes to 
>> reach that score. Depending on the environment, a low time or a large time 
>> could be better. The user controlled parameters could also include
>> 1. Exploration rate/ Exploration rate decay
>> 2. Learning rate 
>> 3. Reward size
>> Perhaps a few more but these are essential.
>> 
>> I like the idea of creating an API to upload results. We could include the 
>> metrics that we've talked about and perhaps include a bit more like the a 
>> recording that you mentioned possibly one where they can watch the agent 
>> learn through each iteration and see it become better.
>> 
>> Thanks,
>> Sahith  
>> 
>> On Fri, Mar 2, 2018 at 6:11 PM Marcus Edel <[email protected] 
>> <mailto:[email protected]>> wrote:
>> Hello Sahith,
>> 
>>> This looks very feasible along with being cool and intuitive. We could 
>>> implement
>>> a system where a user who is a beginner can just choose an environment and 
>>> input
>>> a particular pre-built methods and can compare different methods through
>>> visualizations and the actual emulation of the game environment. Other 
>>> users can
>>> have more control and call only specific functions of the API which they 
>>> need
>>> and can modify everything and these people would be the ones who would most
>>> benefit from a having leaderboard for comparison between other users on 
>>> OpenAI
>>> gym.
>> 
>> I think merging ideas from both sides is a neat idea; the first step should
>> focus on the existing mlpack methods, provide pre-trained models for specific
>> parameter sets and output some metrics, providing a recording of the 
>> environment
>> is also a neat feature. Note the optimizer visualization allows a user to 
>> fine
>> control the optimizer parameter, but only because the time to find a 
>> solution is
>> low, in case of  RL methods we are talking about minutes or hours, so 
>> providing
>> pretraining models is essential. If you like the idea, we should think about
>> some meaningful metrics, besids number of iterations.
>> 
>> For other frameworks, one idea is to provide an API to upload the results, 
>> based
>> on the information, we could generate the metrics.
>> 
>> Let me know what you think.
>> 
>> Thanks,
>> Marcus
>> 
>>> On 2. Mar 2018, at 13:08, Sahith D <[email protected] 
>>> <mailto:[email protected]>> wrote:
>>> 
>>> Hi Marcus,
>>> This looks very feasible along with being cool and intuitive. We could 
>>> implement a system where a user who is a beginner can just choose an 
>>> environment and input a particular pre-built methods and can compare 
>>> different methods through visualizations and the actual emulation of the 
>>> game environment. Other users can have more control and call only specific 
>>> functions of the API which they need and can modify everything and these 
>>> people would be the ones who would most benefit from a having leaderboard 
>>> for comparison between other users on OpenAI gym.
>>> Though I would like to know how in depth you would want this to be. The 
>>> optimizer tutorial seems to have pretty much all the major optimizers 
>>> currently being used. Do you think we should try something thats as 
>>> extensive or just set up a framework for future contributors? 
>>> 
>>> Thanks,
>>> Sahith
>>> 
>>> On Thu, Mar 1, 2018 at 3:35 PM Marcus Edel <[email protected] 
>>> <mailto:[email protected]>> wrote:
>>> Hello Sahith,
>>> 
>>> I like the idea, also since OpenAI abandoned the leaderboard this could be a
>>> great opportunity. I'm a fan of giving a user the opportunity to test the
>>> methods without much hassle, so one idea is to provide an interface for the 
>>> web,
>>> that exposes a minimal set of settings, something like:
>>> 
>>> www.mlpack.org/docs/mlpack-git/doxygen/optimizertutorial.html 
>>> <http://www.mlpack.org/docs/mlpack-git/doxygen/optimizertutorial.html>
>>> 
>>> Let me know what you think, there are a bunch of interesting features, that 
>>> we
>>> could look into, but we should make sure each is tangible and useful.
>>> 
>>> Thanks,
>>> Marcus
>>> 
>>>> On 28. Feb 2018, at 23:03, Sahith D <[email protected] 
>>>> <mailto:[email protected]>> wrote:
>>>> 
>>>> A playground type project sounds like a great idea. We could start with 
>>>> using the current Q-Learning method already present in the mlpack 
>>>> repository and then apply it to a environments in gym as a sort of 
>>>> tutorial. We could then move onto more complex methods like Double 
>>>> Q-Learning and Monte Carlo Tree Search (just suggestions) just to get 
>>>> started so that more people will get encouraged to try their hand at 
>>>> solving the environments in more creative ways using C++ as the python 
>>>> community is already pretty strong. If we could build something of a 
>>>> leaderboard similar to what OpenAI gym already has then it could foster a 
>>>> creative community of people who want to try more RL. Does this sound good 
>>>> or can it be improved upon?
>>>> 
>>>> Thanks,
>>>> Sahith.
>>>> 
>>>> On Wed, Feb 28, 2018 at 3:50 PM Marcus Edel <[email protected] 
>>>> <mailto:[email protected]>> wrote:
>>>> Hello Sahith,
>>>> 
>>>>> 1. We could implement all the fundamental RL algorithms like those over 
>>>>> here
>>>>> https://github.com/dennybritz/reinforcement-learning 
>>>>> <https://github.com/dennybritz/reinforcement-learning> . This repository 
>>>>> contains
>>>>> nearly all the algorithms that are useful for RL according to David 
>>>>> Silver's RL
>>>>> course. They're all currently in python so it could just be a matter of 
>>>>> porting
>>>>> them over to use mlpack.
>>>> 
>>>> I don't think implementing all the methods, is something we should pursue 
>>>> over
>>>> the summer, writing the method itself and coming up with some meaningful 
>>>> tests
>>>> takes time. Also, in my opinion instead of implementing all methods, we 
>>>> should
>>>> pick methods that make sense in a specific context and make them as fast 
>>>> and
>>>> easy to use as possible.
>>>> 
>>>>> 2. We could implement fewer algorithms but work more on solving the 
>>>>> OpenAI gym
>>>>> environments using them. This would require tighter integration of the gym
>>>>> wrapper that you have already written. If enough environments can be 
>>>>> solved then
>>>>> this could become a viable C++ library for comparing RL algorithms in the
>>>>> future.
>>>> 
>>>> I like the idea, this could be a great way to present the RL 
>>>> infrastructure to a
>>>> wider audience, in the form of a playground.
>>>> 
>>>> Let me know what you think.
>>>> 
>>>> Thanks,
>>>> Marcus
>>>> 
>>>>> On 27. Feb 2018, at 23:01, Sahith D <[email protected] 
>>>>> <mailto:[email protected]>> wrote:
>>>>> 
>>>>> Hi Marcus,
>>>>> Sorry for not updating you earlier as I had some exams that I needed to 
>>>>> finish first.
>>>>> I've been working on the policy gradient over in this repository which 
>>>>> you can see over here https://github.com/SND96/mlpack-rl 
>>>>> <https://github.com/SND96/mlpack-rl>
>>>>> I also had some ideas on what this project could be about.
>>>>> 
>>>>> 1. We could implement all the fundamental RL algorithms like those over 
>>>>> here https://github.com/dennybritz/reinforcement-learning 
>>>>> <https://github.com/dennybritz/reinforcement-learning> . This repository 
>>>>> contains nearly all the algorithms that are useful for RL according to 
>>>>> David Silver's RL course. They're all currently in python so it could 
>>>>> just be a matter of porting them over to use mlpack. 
>>>>> 2. We could implement fewer algorithms but work more on solving the 
>>>>> OpenAI gym environments using them. This would require tighter 
>>>>> integration of the gym wrapper that you have already written. If enough 
>>>>> environments can be solved then this could become a viable C++ library 
>>>>> for comparing RL algorithms in the future.
>>>>> 
>>>>> Right now I'm working on the solving one of the environments in gym using 
>>>>> a Deep Q-Learning approach similar to what is already there in the mlpack 
>>>>> library from last year's gsoc. Its taking a bit longer than I hoped as 
>>>>> I'm still familiarizing myself with some of the server calls being made 
>>>>> and how to properly get information about the environements. Would 
>>>>> appreciate your thoughts on the ideas that I have and anything else that 
>>>>> you had in mind.
>>>>> 
>>>>> Thanks!
>>>>> Sahith
>>>>> 
>>>>> On Fri, Feb 23, 2018 at 1:50 PM Sahith D <[email protected] 
>>>>> <mailto:[email protected]>> wrote:
>>>>> Hi Marcus,
>>>>> I've been having difficulties compiling mlpack which has stalled my 
>>>>> progress. I've opened an issue on the same and appreciate any help.
>>>>> 
>>>>> On Thu, Feb 22, 2018 at 10:09 AM Sahith D <[email protected] 
>>>>> <mailto:[email protected]>> wrote:
>>>>> Hey Marcus,
>>>>> No problem with the slow response as I was familiarizing myself better 
>>>>> with the codebase and the methods present in the meantime. I'll start 
>>>>> working on what you mentioned. I'll notify you when I finish.
>>>>> 
>>>>> Thanks!
>>>>> 
>>>>> On Thu, Feb 22, 2018 at 4:56 AM Marcus Edel <[email protected] 
>>>>> <mailto:[email protected]>> wrote:
>>>>> Hello Sahith,
>>>>> 
>>>>> thanks for getting in touch and sorry for the slow response.
>>>>> 
>>>>> > My name is Sahith. I've been working on Reinforcement Learning for the 
>>>>> > past year
>>>>> > and am interested in coding with mlpack on the RL project for this 
>>>>> > summer. I've
>>>>> > been going through the codebase and have managed to get the Open AI gym 
>>>>> > api up
>>>>> > and running on my computer. Is there any other specific task I can do 
>>>>> > while I
>>>>> > get to know more of the codebase?
>>>>> 
>>>>> Great that you got it all working, another good entry point is to write a 
>>>>> simple
>>>>> RL method, one method that is simple that comes to mind is the Policy 
>>>>> Gradients
>>>>> method. Another idea is to write an example for solving a GYM environment 
>>>>> with
>>>>> the existing codebase, something in the vein of the Kaggel Digit 
>>>>> Recognizer
>>>>> Eugene wrote
>>>>> (https://github.com/mlpack/models/tree/master/Kaggle/DigitRecognizer 
>>>>> <https://github.com/mlpack/models/tree/master/Kaggle/DigitRecognizer>).
>>>>> 
>>>>> Let me know if I should clarify anything.
>>>>> 
>>>>> Thanks,
>>>>> Marcus
>>>>> 
>>>>> > On 19. Feb 2018, at 20:41, Sahith D <[email protected] 
>>>>> > <mailto:[email protected]>> wrote:
>>>>> >
>>>>> > Hello Marcus,
>>>>> > My name is Sahith. I've been working on Reinforcement Learning for the 
>>>>> > past year and am interested in coding with mlpack on the RL project for 
>>>>> > this summer. I've been going through the codebase and have managed to 
>>>>> > get the Open AI gym api up and running on my computer. Is there any 
>>>>> > other specific task I can do while I get to know more of the codebase?
>>>>> > Thanks!
>>>>> 
>>>> 
>>> 
>> 
>

_______________________________________________
mlpack mailing list
[email protected]
http://knife.lugatgt.org/cgi-bin/mailman/listinfo/mlpack

Re: [mlpack] Reinforcement Learning GSOC

Reply via email to