Hello Chirag,

> I could implement a basic evolution strategies module within the
> src/mlpack/methods/reinforcement_learning module or as a separate module 
> itself,
> and test it on sample functions for a start ( reference :
> https://gist.github.com/karpathy/77fbb6a8dac5395f1b73e7a89300318d)

It might make sense to implement the Natural Evolution Strategie as an
optimizer, see mlpack.org/docs/mlpack-git/doxygen/optimizertutorial.html and
arxiv.org/abs/1711.06581 for more information. Let me know what you think.

> All in all, I feel I can form a proper timeline to try to fit this in the
> timeframe of the summer.

Agreed, really like the idea to combine RL with Neuroevolution, also
https://github.com/mlpack/mlpack/wiki/Google-Summer-of-Code-Application-Guide
might be helpful.

Let me know if I should clarify anything.

Thanks,
Marcus

> On 3. Mar 2018, at 16:31, Chirag Ramdas <chiragram...@gmail.com> wrote:
> 
> Hello Marcus,
> 
> Following up on my previous email, where I mentioned finding this idea very 
> interesting
> https://arxiv.org/abs/1802.04821 <https://arxiv.org/abs/1802.04821>
> 
> So in the past three days, I have been going through OpenAI's blog on 
> Evolution strategies as well their paper.
> https://arxiv.org/abs/1703.03864 <https://arxiv.org/abs/1703.03864>
> https://blog.openai.com/evolution-strategies/ 
> <https://blog.openai.com/evolution-strategies/>
> 
> The blog post is very well written, and brings out the simple yet beautiful 
> way in which evolution strategies work.
> 
> In terms of the paper in general, where they have combined evolution 
> strategies along with policy gradients, I feel it would be a nice addition to 
> the existing code base of mlpack.
> 
> I could implement a basic evolution strategies module within the 
> src/mlpack/methods/reinforcement_learning module or as a separate module 
> itself, and test it on sample functions for a start ( reference : 
> https://gist.github.com/karpathy/77fbb6a8dac5395f1b73e7a89300318d 
> <https://gist.github.com/karpathy/77fbb6a8dac5395f1b73e7a89300318d>)
> 
> After that, i could go on and implement the idea suggested in the paper, 
> which combines it with a policy gradient technique.
> 
> Since the paper suggests that their results are at par with state of the art 
> TRPO/PPO, we could also benchmark the performance of this technique against a 
> standard MuJoCo environment. 
> 
> All in all, I feel I can form a proper timeline to try to fit this in the 
> timeframe of the summer.
> 
> Do let me know what you feel about this, and if it appeals to you!
> 
> Thanks a lot!
> 

_______________________________________________
mlpack mailing list
mlpack@lists.mlpack.org
http://knife.lugatgt.org/cgi-bin/mailman/listinfo/mlpack

Reply via email to