Hello Chirag, > I could implement a basic evolution strategies module within the > src/mlpack/methods/reinforcement_learning module or as a separate module > itself, > and test it on sample functions for a start ( reference : > https://gist.github.com/karpathy/77fbb6a8dac5395f1b73e7a89300318d)
It might make sense to implement the Natural Evolution Strategie as an optimizer, see mlpack.org/docs/mlpack-git/doxygen/optimizertutorial.html and arxiv.org/abs/1711.06581 for more information. Let me know what you think. > All in all, I feel I can form a proper timeline to try to fit this in the > timeframe of the summer. Agreed, really like the idea to combine RL with Neuroevolution, also https://github.com/mlpack/mlpack/wiki/Google-Summer-of-Code-Application-Guide might be helpful. Let me know if I should clarify anything. Thanks, Marcus > On 3. Mar 2018, at 16:31, Chirag Ramdas <[email protected]> wrote: > > Hello Marcus, > > Following up on my previous email, where I mentioned finding this idea very > interesting > https://arxiv.org/abs/1802.04821 <https://arxiv.org/abs/1802.04821> > > So in the past three days, I have been going through OpenAI's blog on > Evolution strategies as well their paper. > https://arxiv.org/abs/1703.03864 <https://arxiv.org/abs/1703.03864> > https://blog.openai.com/evolution-strategies/ > <https://blog.openai.com/evolution-strategies/> > > The blog post is very well written, and brings out the simple yet beautiful > way in which evolution strategies work. > > In terms of the paper in general, where they have combined evolution > strategies along with policy gradients, I feel it would be a nice addition to > the existing code base of mlpack. > > I could implement a basic evolution strategies module within the > src/mlpack/methods/reinforcement_learning module or as a separate module > itself, and test it on sample functions for a start ( reference : > https://gist.github.com/karpathy/77fbb6a8dac5395f1b73e7a89300318d > <https://gist.github.com/karpathy/77fbb6a8dac5395f1b73e7a89300318d>) > > After that, i could go on and implement the idea suggested in the paper, > which combines it with a policy gradient technique. > > Since the paper suggests that their results are at par with state of the art > TRPO/PPO, we could also benchmark the performance of this technique against a > standard MuJoCo environment. > > All in all, I feel I can form a proper timeline to try to fit this in the > timeframe of the summer. > > Do let me know what you feel about this, and if it appeals to you! > > Thanks a lot! >
_______________________________________________ mlpack mailing list [email protected] http://knife.lugatgt.org/cgi-bin/mailman/listinfo/mlpack
