Hello Sahith, > Apologies for the long delay in my reply. I had my midsem examinations going > on > and was unable to respond.
No worries, I hope everything went well. > The time metric I had in mind was more related to how long the actual in game > time is for which I think is independent of the system and is part of the > environment itself. However I realized that most games already have a score > that > focuses on time so this might seem redundant. I see we could definitely introduce a metric that is related, e.g. that counts the number of evaluations/iterations. > In one of your previous mails you mentioned we should initially focus on > existing mlpack methods for the training. The only mlpack RL method currently > present is a Q-Learning model from last year's GSOC which includes policies > and > also experience replays. While this is good for the basic environments in > OpenAI > we should implement at least one more method to supplement it. Sure, if you like to add another method, please feel free, I guess we could also see if we can use a recurrent network, or CMAES, CNE, to solve a given task. > 1. Double- DQN could be a good fit as it just builds on top of the current > method and hence would be the best to pursue 2. An advanced Policy Gradient > method which would take more time but could also extend the number of > environments that can be solved in the future. I like both ideas, we should just make sure it is manageable, as you already pointed out the Advanced Policy Gradient method might take more time. > Also in regards building an API I would like to know whether you wanted to > focus > on building on top of the methods already present in mlpack and extend them as > much as we can or build something from scratch but using the mlpack methods > present whenever we need them. Are you talking about the additional RL method? Let me know if I should clarify anything. Thanks, Marcus > On 11. Mar 2018, at 04:54, Sahith D <[email protected]> wrote: > > Hello Marcus, > Apologies for the long delay in my reply. I had my midsem examinations going > on and was unable to respond. > > The time metric I had in mind was more related to how long the actual in game > time is for which I think is independent of the system and is part of the > environment itself. However I realized that most games already have a score > that focuses on time so this might seem redundant. > > In one of your previous mails you mentioned we should initially focus on > existing mlpack methods for the training. The only mlpack RL method currently > present is a Q-Learning model from last year's GSOC which includes policies > and also experience replays. While this is good for the basic environments in > OpenAI we should implement at least one more method to supplement it. > > 1. Double- DQN could be a good fit as it just builds on top of the current > method and hence would be the best to pursue > 2. An advanced Policy Gradient method which would take more time but could > also extend the number of environments that can be solved in the future. > > Also in regards building an API I would like to know whether you wanted to > focus on building on top of the methods already present in mlpack and extend > them as much as we can or build something from scratch but using the mlpack > methods present whenever we need them. > > Thanks > > > > On Sat, Mar 3, 2018 at 5:39 PM Marcus Edel <[email protected] > <mailto:[email protected]>> wrote: > Hello Sahith, > > I'm not sure about the time metric, it might be meaningless if not run on the > same or similar system. If we only compare our own methods, that should be > fine > through. The rest sounds reasonable to me. > > Best, > Marcus > >> On 2. Mar 2018, at 22:34, Sahith D <[email protected] >> <mailto:[email protected]>> wrote: >> >> Hi Marcus, >> >> Making pre-trained models sounds good however we'll have to pick the most >> popular or easiest environments for this at least in the start. >> For meaningful metrics other than iterations we could have use the score of >> the game which is the best possible metric and also the time it takes to >> reach that score. Depending on the environment, a low time or a large time >> could be better. The user controlled parameters could also include >> 1. Exploration rate/ Exploration rate decay >> 2. Learning rate >> 3. Reward size >> Perhaps a few more but these are essential. >> >> I like the idea of creating an API to upload results. We could include the >> metrics that we've talked about and perhaps include a bit more like the a >> recording that you mentioned possibly one where they can watch the agent >> learn through each iteration and see it become better. >> >> Thanks, >> Sahith >> >> On Fri, Mar 2, 2018 at 6:11 PM Marcus Edel <[email protected] >> <mailto:[email protected]>> wrote: >> Hello Sahith, >> >>> This looks very feasible along with being cool and intuitive. We could >>> implement >>> a system where a user who is a beginner can just choose an environment and >>> input >>> a particular pre-built methods and can compare different methods through >>> visualizations and the actual emulation of the game environment. Other >>> users can >>> have more control and call only specific functions of the API which they >>> need >>> and can modify everything and these people would be the ones who would most >>> benefit from a having leaderboard for comparison between other users on >>> OpenAI >>> gym. >> >> I think merging ideas from both sides is a neat idea; the first step should >> focus on the existing mlpack methods, provide pre-trained models for specific >> parameter sets and output some metrics, providing a recording of the >> environment >> is also a neat feature. Note the optimizer visualization allows a user to >> fine >> control the optimizer parameter, but only because the time to find a >> solution is >> low, in case of RL methods we are talking about minutes or hours, so >> providing >> pretraining models is essential. If you like the idea, we should think about >> some meaningful metrics, besids number of iterations. >> >> For other frameworks, one idea is to provide an API to upload the results, >> based >> on the information, we could generate the metrics. >> >> Let me know what you think. >> >> Thanks, >> Marcus >> >>> On 2. Mar 2018, at 13:08, Sahith D <[email protected] >>> <mailto:[email protected]>> wrote: >>> >>> Hi Marcus, >>> This looks very feasible along with being cool and intuitive. We could >>> implement a system where a user who is a beginner can just choose an >>> environment and input a particular pre-built methods and can compare >>> different methods through visualizations and the actual emulation of the >>> game environment. Other users can have more control and call only specific >>> functions of the API which they need and can modify everything and these >>> people would be the ones who would most benefit from a having leaderboard >>> for comparison between other users on OpenAI gym. >>> Though I would like to know how in depth you would want this to be. The >>> optimizer tutorial seems to have pretty much all the major optimizers >>> currently being used. Do you think we should try something thats as >>> extensive or just set up a framework for future contributors? >>> >>> Thanks, >>> Sahith >>> >>> On Thu, Mar 1, 2018 at 3:35 PM Marcus Edel <[email protected] >>> <mailto:[email protected]>> wrote: >>> Hello Sahith, >>> >>> I like the idea, also since OpenAI abandoned the leaderboard this could be a >>> great opportunity. I'm a fan of giving a user the opportunity to test the >>> methods without much hassle, so one idea is to provide an interface for the >>> web, >>> that exposes a minimal set of settings, something like: >>> >>> www.mlpack.org/docs/mlpack-git/doxygen/optimizertutorial.html >>> <http://www.mlpack.org/docs/mlpack-git/doxygen/optimizertutorial.html> >>> >>> Let me know what you think, there are a bunch of interesting features, that >>> we >>> could look into, but we should make sure each is tangible and useful. >>> >>> Thanks, >>> Marcus >>> >>>> On 28. Feb 2018, at 23:03, Sahith D <[email protected] >>>> <mailto:[email protected]>> wrote: >>>> >>>> A playground type project sounds like a great idea. We could start with >>>> using the current Q-Learning method already present in the mlpack >>>> repository and then apply it to a environments in gym as a sort of >>>> tutorial. We could then move onto more complex methods like Double >>>> Q-Learning and Monte Carlo Tree Search (just suggestions) just to get >>>> started so that more people will get encouraged to try their hand at >>>> solving the environments in more creative ways using C++ as the python >>>> community is already pretty strong. If we could build something of a >>>> leaderboard similar to what OpenAI gym already has then it could foster a >>>> creative community of people who want to try more RL. Does this sound good >>>> or can it be improved upon? >>>> >>>> Thanks, >>>> Sahith. >>>> >>>> On Wed, Feb 28, 2018 at 3:50 PM Marcus Edel <[email protected] >>>> <mailto:[email protected]>> wrote: >>>> Hello Sahith, >>>> >>>>> 1. We could implement all the fundamental RL algorithms like those over >>>>> here >>>>> https://github.com/dennybritz/reinforcement-learning >>>>> <https://github.com/dennybritz/reinforcement-learning> . This repository >>>>> contains >>>>> nearly all the algorithms that are useful for RL according to David >>>>> Silver's RL >>>>> course. They're all currently in python so it could just be a matter of >>>>> porting >>>>> them over to use mlpack. >>>> >>>> I don't think implementing all the methods, is something we should pursue >>>> over >>>> the summer, writing the method itself and coming up with some meaningful >>>> tests >>>> takes time. Also, in my opinion instead of implementing all methods, we >>>> should >>>> pick methods that make sense in a specific context and make them as fast >>>> and >>>> easy to use as possible. >>>> >>>>> 2. We could implement fewer algorithms but work more on solving the >>>>> OpenAI gym >>>>> environments using them. This would require tighter integration of the gym >>>>> wrapper that you have already written. If enough environments can be >>>>> solved then >>>>> this could become a viable C++ library for comparing RL algorithms in the >>>>> future. >>>> >>>> I like the idea, this could be a great way to present the RL >>>> infrastructure to a >>>> wider audience, in the form of a playground. >>>> >>>> Let me know what you think. >>>> >>>> Thanks, >>>> Marcus >>>> >>>>> On 27. Feb 2018, at 23:01, Sahith D <[email protected] >>>>> <mailto:[email protected]>> wrote: >>>>> >>>>> Hi Marcus, >>>>> Sorry for not updating you earlier as I had some exams that I needed to >>>>> finish first. >>>>> I've been working on the policy gradient over in this repository which >>>>> you can see over here https://github.com/SND96/mlpack-rl >>>>> <https://github.com/SND96/mlpack-rl> >>>>> I also had some ideas on what this project could be about. >>>>> >>>>> 1. We could implement all the fundamental RL algorithms like those over >>>>> here https://github.com/dennybritz/reinforcement-learning >>>>> <https://github.com/dennybritz/reinforcement-learning> . This repository >>>>> contains nearly all the algorithms that are useful for RL according to >>>>> David Silver's RL course. They're all currently in python so it could >>>>> just be a matter of porting them over to use mlpack. >>>>> 2. We could implement fewer algorithms but work more on solving the >>>>> OpenAI gym environments using them. This would require tighter >>>>> integration of the gym wrapper that you have already written. If enough >>>>> environments can be solved then this could become a viable C++ library >>>>> for comparing RL algorithms in the future. >>>>> >>>>> Right now I'm working on the solving one of the environments in gym using >>>>> a Deep Q-Learning approach similar to what is already there in the mlpack >>>>> library from last year's gsoc. Its taking a bit longer than I hoped as >>>>> I'm still familiarizing myself with some of the server calls being made >>>>> and how to properly get information about the environements. Would >>>>> appreciate your thoughts on the ideas that I have and anything else that >>>>> you had in mind. >>>>> >>>>> Thanks! >>>>> Sahith >>>>> >>>>> On Fri, Feb 23, 2018 at 1:50 PM Sahith D <[email protected] >>>>> <mailto:[email protected]>> wrote: >>>>> Hi Marcus, >>>>> I've been having difficulties compiling mlpack which has stalled my >>>>> progress. I've opened an issue on the same and appreciate any help. >>>>> >>>>> On Thu, Feb 22, 2018 at 10:09 AM Sahith D <[email protected] >>>>> <mailto:[email protected]>> wrote: >>>>> Hey Marcus, >>>>> No problem with the slow response as I was familiarizing myself better >>>>> with the codebase and the methods present in the meantime. I'll start >>>>> working on what you mentioned. I'll notify you when I finish. >>>>> >>>>> Thanks! >>>>> >>>>> On Thu, Feb 22, 2018 at 4:56 AM Marcus Edel <[email protected] >>>>> <mailto:[email protected]>> wrote: >>>>> Hello Sahith, >>>>> >>>>> thanks for getting in touch and sorry for the slow response. >>>>> >>>>> > My name is Sahith. I've been working on Reinforcement Learning for the >>>>> > past year >>>>> > and am interested in coding with mlpack on the RL project for this >>>>> > summer. I've >>>>> > been going through the codebase and have managed to get the Open AI gym >>>>> > api up >>>>> > and running on my computer. Is there any other specific task I can do >>>>> > while I >>>>> > get to know more of the codebase? >>>>> >>>>> Great that you got it all working, another good entry point is to write a >>>>> simple >>>>> RL method, one method that is simple that comes to mind is the Policy >>>>> Gradients >>>>> method. Another idea is to write an example for solving a GYM environment >>>>> with >>>>> the existing codebase, something in the vein of the Kaggel Digit >>>>> Recognizer >>>>> Eugene wrote >>>>> (https://github.com/mlpack/models/tree/master/Kaggle/DigitRecognizer >>>>> <https://github.com/mlpack/models/tree/master/Kaggle/DigitRecognizer>). >>>>> >>>>> Let me know if I should clarify anything. >>>>> >>>>> Thanks, >>>>> Marcus >>>>> >>>>> > On 19. Feb 2018, at 20:41, Sahith D <[email protected] >>>>> > <mailto:[email protected]>> wrote: >>>>> > >>>>> > Hello Marcus, >>>>> > My name is Sahith. I've been working on Reinforcement Learning for the >>>>> > past year and am interested in coding with mlpack on the RL project for >>>>> > this summer. I've been going through the codebase and have managed to >>>>> > get the Open AI gym api up and running on my computer. Is there any >>>>> > other specific task I can do while I get to know more of the codebase? >>>>> > Thanks! >>>>> >>>> >>> >> >
_______________________________________________ mlpack mailing list [email protected] http://knife.lugatgt.org/cgi-bin/mailman/listinfo/mlpack
