Hello Marcus, I see we could definitely introduce a metric that is related, e.g. that > counts > the number of evaluations/iterations. >
Yes that seems like a good metric. Though like I said before it might be a bit redundant for some environments. > I like both ideas, we should just make sure it is manageable, as you > already > pointed out the Advanced Policy Gradient method might take more time. > I'll start making a more concrete approach to both these methods > Are you talking about the additional RL method? > > Not exactly. I was referring to the wrapper that we will be building around OpenAI gym. So I was wondering whether we will integrate that into the main mlpack repository or whether it'll be a completely separate project. I will start adding stuff to my proposal and send it to you for your thoughts soon. Thanks, Sahith > On 11. Mar 2018, at 04:54, Sahith D <[email protected]> wrote: > > > Hello Marcus, > Apologies for the long delay in my reply. I had my midsem examinations > going on and was unable to respond. > > The time metric I had in mind was more related to how long the actual in > game time is for which I think is independent of the system and is part of > the environment itself. However I realized that most games already have a > score that focuses on time so this might seem redundant. > > In one of your previous mails you mentioned we should initially focus on > existing mlpack methods for the training. The only mlpack RL method > currently present is a Q-Learning model from last year's GSOC which > includes policies and also experience replays. While this is good for the > basic environments in OpenAI we should implement at least one more method > to supplement it. > > 1. Double- DQN could be a good fit as it just builds on top of the current > method and hence would be the best to pursue > 2. An advanced Policy Gradient method which would take more time but could > also extend the number of environments that can be solved in the future. > > Also in regards building an API I would like to know whether you wanted to > focus on building on top of the methods already present in mlpack and > extend them as much as we can or build something from scratch but using the > mlpack methods present whenever we need them. > > Thanks > > > > On Sat, Mar 3, 2018 at 5:39 PM Marcus Edel <[email protected]> > wrote: > >> Hello Sahith, >> >> I'm not sure about the time metric, it might be meaningless if not run on >> the >> same or similar system. If we only compare our own methods, that should >> be fine >> through. The rest sounds reasonable to me. >> >> Best, >> Marcus >> >> On 2. Mar 2018, at 22:34, Sahith D <[email protected]> wrote: >> >> Hi Marcus, >> >> Making pre-trained models sounds good however we'll have to pick the most >> popular or easiest environments for this at least in the start. >> For meaningful metrics other than iterations we could have use the >> *score* of the game which is the best possible metric and also the *time* >> it takes to reach that score. Depending on the environment, a low time or a >> large time could be better. The user controlled parameters could also >> include >> 1. Exploration rate/ Exploration rate decay >> 2. Learning rate >> 3. Reward size >> Perhaps a few more but these are essential. >> >> I like the idea of creating an API to upload results. We could include >> the metrics that we've talked about and perhaps include a bit more like the >> a recording that you mentioned possibly one where they can watch the agent >> learn through each iteration and see it become better. >> >> Thanks, >> Sahith >> >> On Fri, Mar 2, 2018 at 6:11 PM Marcus Edel <[email protected]> >> wrote: >> >>> Hello Sahith, >>> >>> This looks very feasible along with being cool and intuitive. We could >>> implement >>> a system where a user who is a beginner can just choose an environment >>> and input >>> a particular pre-built methods and can compare different methods through >>> visualizations and the actual emulation of the game environment. Other >>> users can >>> have more control and call only specific functions of the API which they >>> need >>> and can modify everything and these people would be the ones who would >>> most >>> benefit from a having leaderboard for comparison between other users on >>> OpenAI >>> gym. >>> >>> >>> I think merging ideas from both sides is a neat idea; the first step >>> should >>> focus on the existing mlpack methods, provide pre-trained models for >>> specific >>> parameter sets and output some metrics, providing a recording of the >>> environment >>> is also a neat feature. Note the optimizer visualization allows a user >>> to fine >>> control the optimizer parameter, but only because the time to find a >>> solution is >>> low, in case of RL methods we are talking about minutes or hours, so >>> providing >>> pretraining models is essential. If you like the idea, we should think >>> about >>> some meaningful metrics, besids number of iterations. >>> >>> For other frameworks, one idea is to provide an API to upload the >>> results, based >>> on the information, we could generate the metrics. >>> >>> Let me know what you think. >>> >>> Thanks, >>> Marcus >>> >>> On 2. Mar 2018, at 13:08, Sahith D <[email protected]> wrote: >>> >>> Hi Marcus, >>> This looks very feasible along with being cool and intuitive. We could >>> implement a system where a user who is a beginner can just choose an >>> environment and input a particular pre-built methods and can compare >>> different methods through visualizations and the actual emulation of the >>> game environment. Other users can have more control and call only specific >>> functions of the API which they need and can modify everything and these >>> people would be the ones who would most benefit from a having leaderboard >>> for comparison between other users on OpenAI gym. >>> Though I would like to know how in depth you would want this to be. The >>> optimizer tutorial seems to have pretty much all the major optimizers >>> currently being used. Do you think we should try something thats as >>> extensive or just set up a framework for future contributors? >>> >>> Thanks, >>> Sahith >>> >>> On Thu, Mar 1, 2018 at 3:35 PM Marcus Edel <[email protected]> >>> wrote: >>> >>>> Hello Sahith, >>>> >>>> I like the idea, also since OpenAI abandoned the leaderboard this could >>>> be a >>>> great opportunity. I'm a fan of giving a user the opportunity to test >>>> the >>>> methods without much hassle, so one idea is to provide an interface for >>>> the web, >>>> that exposes a minimal set of settings, something like: >>>> >>>> www.mlpack.org/docs/mlpack-git/doxygen/optimizertutorial.html >>>> >>>> Let me know what you think, there are a bunch of interesting features, >>>> that we >>>> could look into, but we should make sure each is tangible and useful. >>>> >>>> Thanks, >>>> Marcus >>>> >>>> On 28. Feb 2018, at 23:03, Sahith D <[email protected]> wrote: >>>> >>>> A playground type project sounds like a great idea. We could start with >>>> using the current Q-Learning method already present in the mlpack >>>> repository and then apply it to a environments in gym as a sort of >>>> tutorial. We could then move onto more complex methods like Double >>>> Q-Learning and Monte Carlo Tree Search (just suggestions) just to get >>>> started so that more people will get encouraged to try their hand at >>>> solving the environments in more creative ways using C++ as the python >>>> community is already pretty strong. If we could build something of a >>>> leaderboard similar to what OpenAI gym already has then it could foster a >>>> creative community of people who want to try more RL. Does this sound good >>>> or can it be improved upon? >>>> >>>> Thanks, >>>> Sahith. >>>> >>>> On Wed, Feb 28, 2018 at 3:50 PM Marcus Edel <[email protected]> >>>> wrote: >>>> >>>>> Hello Sahith, >>>>> >>>>> 1. We could implement all the fundamental RL algorithms like those >>>>> over here >>>>> https://github.com/dennybritz/reinforcement-learning . This >>>>> repository contains >>>>> nearly all the algorithms that are useful for RL according to David >>>>> Silver's RL >>>>> course. They're all currently in python so it could just be a matter >>>>> of porting >>>>> them over to use mlpack. >>>>> >>>>> >>>>> I don't think implementing all the methods, is something we should >>>>> pursue over >>>>> the summer, writing the method itself and coming up with some >>>>> meaningful tests >>>>> takes time. Also, in my opinion instead of implementing all methods, >>>>> we should >>>>> pick methods that make sense in a specific context and make them as >>>>> fast and >>>>> easy to use as possible. >>>>> >>>>> 2. We could implement fewer algorithms but work more on solving the >>>>> OpenAI gym >>>>> environments using them. This would require tighter integration of the >>>>> gym >>>>> wrapper that you have already written. If enough environments can be >>>>> solved then >>>>> this could become a viable C++ library for comparing RL algorithms in >>>>> the >>>>> future. >>>>> >>>>> >>>>> I like the idea, this could be a great way to present the RL >>>>> infrastructure to a >>>>> wider audience, in the form of a playground. >>>>> >>>>> Let me know what you think. >>>>> >>>>> Thanks, >>>>> Marcus >>>>> >>>>> On 27. Feb 2018, at 23:01, Sahith D <[email protected]> wrote: >>>>> >>>>> Hi Marcus, >>>>> Sorry for not updating you earlier as I had some exams that I needed >>>>> to finish first. >>>>> I've been working on the policy gradient over in this repository which >>>>> you can see over here https://github.com/SND96/mlpack-rl >>>>> I also had some ideas on what this project could be about. >>>>> >>>>> 1. We could implement all the fundamental RL algorithms like those >>>>> over here https://github.com/dennybritz/reinforcement-learning . This >>>>> repository contains nearly all the algorithms that are useful for RL >>>>> according to David Silver's RL course. They're all currently in python so >>>>> it could just be a matter of porting them over to use mlpack. >>>>> 2. We could implement fewer algorithms but work more on solving the >>>>> OpenAI gym environments using them. This would require tighter integration >>>>> of the gym wrapper that you have already written. If enough environments >>>>> can be solved then this could become a viable C++ library for comparing RL >>>>> algorithms in the future. >>>>> >>>>> Right now I'm working on the solving one of the environments in gym >>>>> using a Deep Q-Learning approach similar to what is already there in the >>>>> mlpack library from last year's gsoc. Its taking a bit longer than I hoped >>>>> as I'm still familiarizing myself with some of the server calls being made >>>>> and how to properly get information about the environements. Would >>>>> appreciate your thoughts on the ideas that I have and anything else that >>>>> you had in mind. >>>>> >>>>> Thanks! >>>>> Sahith >>>>> >>>>> On Fri, Feb 23, 2018 at 1:50 PM Sahith D <[email protected]> wrote: >>>>> >>>>>> Hi Marcus, >>>>>> I've been having difficulties compiling mlpack which has stalled my >>>>>> progress. I've opened an issue on the same and appreciate any help. >>>>>> >>>>>> On Thu, Feb 22, 2018 at 10:09 AM Sahith D <[email protected]> wrote: >>>>>> >>>>>>> Hey Marcus, >>>>>>> No problem with the slow response as I was familiarizing myself >>>>>>> better with the codebase and the methods present in the meantime. I'll >>>>>>> start working on what you mentioned. I'll notify you when I finish. >>>>>>> >>>>>>> Thanks! >>>>>>> >>>>>>> On Thu, Feb 22, 2018 at 4:56 AM Marcus Edel < >>>>>>> [email protected]> wrote: >>>>>>> >>>>>>>> Hello Sahith, >>>>>>>> >>>>>>>> thanks for getting in touch and sorry for the slow response. >>>>>>>> >>>>>>>> > My name is Sahith. I've been working on Reinforcement Learning >>>>>>>> for the past year >>>>>>>> > and am interested in coding with mlpack on the RL project for >>>>>>>> this summer. I've >>>>>>>> > been going through the codebase and have managed to get the Open >>>>>>>> AI gym api up >>>>>>>> > and running on my computer. Is there any other specific task I >>>>>>>> can do while I >>>>>>>> > get to know more of the codebase? >>>>>>>> >>>>>>>> Great that you got it all working, another good entry point is to >>>>>>>> write a simple >>>>>>>> RL method, one method that is simple that comes to mind is the >>>>>>>> Policy Gradients >>>>>>>> method. Another idea is to write an example for solving a GYM >>>>>>>> environment with >>>>>>>> the existing codebase, something in the vein of the Kaggel Digit >>>>>>>> Recognizer >>>>>>>> Eugene wrote >>>>>>>> ( >>>>>>>> https://github.com/mlpack/models/tree/master/Kaggle/DigitRecognizer >>>>>>>> ). >>>>>>>> >>>>>>>> Let me know if I should clarify anything. >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Marcus >>>>>>>> >>>>>>>> > On 19. Feb 2018, at 20:41, Sahith D <[email protected]> wrote: >>>>>>>> > >>>>>>>> > Hello Marcus, >>>>>>>> > My name is Sahith. I've been working on Reinforcement Learning >>>>>>>> for the past year and am interested in coding with mlpack on the RL >>>>>>>> project >>>>>>>> for this summer. I've been going through the codebase and have managed >>>>>>>> to >>>>>>>> get the Open AI gym api up and running on my computer. Is there any >>>>>>>> other >>>>>>>> specific task I can do while I get to know more of the codebase? >>>>>>>> > Thanks! >>>>>>>> >>>>>>>> >>>>> >>>> >>> >>
_______________________________________________ mlpack mailing list [email protected] http://knife.lugatgt.org/cgi-bin/mailman/listinfo/mlpack
