Please find below an offer for an intership from February/March to July/August 
2024 in Mnemosyne team at Inria, Bordeaux, France

# Title
Title: Evolving Reservoirs for Meta Reinforcement Learning

# Supervision
Xavier Hinaut, Mnemosyne team, Inria Bordeaux
Potential collaboration with Clément Moulin-Frier (Flowers Team) & Pierrick 
Legrand (Astral Team)

# Duration:
6 months, spring/summer 2024. Level: Master 2 research internship

# Keywords:
meta reinforcement learning, reservoir computing, evolutionary computation, 
neuroevolution, simulation environments, scientific programming with Python.

# Research team & internship location
Mnémosyne Team: Inria Bordeaux Sud-Ouest, LABRI & Institut des Maladies 
Neurodégénératives (Centre Broca Aquitaine, Carreire campus)
https://team.inria.fr/mnemosyne

# Project
How can a reinforcement learning (RL) agent autonomously learn a diversity of 
skills with minimal external supervision? Recent works have shown that a single 
goal-conditioned policy can be trained to solve multiple tasks in parallel (see 
e.g. Colas et al., 2021 for a recent review). Other works in Meta Reinforcement 
Learning (Meta-RL) have shown that artificial agents can “learn how to learn” 
(hence the term meta-learning) multiple skills without having access to 
goal-related information (see e.g. Weng, 2019 for an introduction).

Meta-RL aims at equipping agents with the ability to generalize to tasks or 
environments that have not been encountered during training. Two nested 
processes of adaptation are traditionally considered: an outer adaptation loop 
optimizing the hyperparameters of an inner adaptation loop. The inner 
adaptation loop can be a standard RL algorithm operating on a given 
environment. The outer loop is tuning the hyperparameters of the inner loop 
such that it performs well on a wide distribution of environments. The end 
result of this nested training process is an algorithm that learns how to 
learn, i.e. learns how to adapt to tasks that have never been encountered 
during training. Works differ in the optimization technique and adaptation 
mechanism they consider in the two loops. The outer loop often optimizes 
higher-level structures such as the architecture of a neural network (Baker et 
al., 2017), the morphology (Gupta et al., 2021), a curiosity module providing 
intrinsic rewards (Alet et al., 2020) or plasticity rules (Najarro & Risi, 
2021), employing either evolutionary or gradient-based optimization.

In parallel, Reservoir Computing (RC, (Jaeger 2001+)) is particularly well 
suited for extracting information from data stream with complex temporal 
dynamics (e.g. weather forecast or language). It is a machine learning paradigm 
on sequential data where a recurrent neural network is only partially trained 
(e.g. only training a linear readout).


One of the major interests of these recurrent neural networks is their reduced 
computational cost and the possibility to learn both in on-line and off-line 
fashion. They have been successfully applied on a wide variety of tasks, from 
prediction/generation of chaotic timeseries to discrimination of audio 
sequences, such as bird song recognition. They offer de facto a fast, simple 
yet efficient way to train RNNs. This "reservoir of computations" works thanks 
to random projections in large dimensions, and is thus similar to temporal 
Support Vector Machines (SVM).

In this internship, we will explore how Reservoir Computing can be leveraged in 
the context of Meta-RL. In this context, we will consider evolving the 
architecture of a reservoir for performing well on a wide range of RL tasks 
(see Seoane, 2019 for theoretical arguments on this evolutionary perspective). 
For this aim, we will use the ReservoirPy  Python library developed in the 
Mnemosyne team (Trouvain & Hinaut, 2022) that we will interface with 
evolutionary computation algorithms (e.g. Tang et al., 2022). The project will 
involve:

●       Designing of a wide range of RL tasks in an existing simulation 
environment (ex MuJoCo)
●       Learning how to learn multiple tasks without access to the goal 
information (i.e. meta-RL with a reservoir).
●       Evolve reservoir architectures for meta-RL with evolutionary methods


# How to apply
Contact [email protected]  with a CV and letter of motivation. We also 
recommend sending documents or reports describing previous projects you have 
been working on (even if they are not directly related to the topic), as well 
as your grades and links to some of your code repositories.

# Requirements
We are looking for highly motivated MSc students (Master II). Programming 
skills and prior experience with Python and deep learning frameworks (Pytorch, 
Tensorflow) are expected.

# Other internship offers
https://github.com/neuronalX/internships/

# References
Alet, F., Schneider, M. F., Lozano-Perez, T., & Kaelbling, L. P. (2020). 
Meta-learning curiosity algorithms. International Conference on Learning 
Representations (ICLR 2020). https://openreview.net/forum?id=BygdyxHFDS
Baker, B., Gupta, O., Naik, N., & Raskar, R. (2017). Designing neural network 
architectures using reinforcement learning. ArXiv:1611.02167 [Cs]. 
http://arxiv.org/abs/1611.02167
Botvinick, M., Ritter, S., Wang, J. X., Kurth-Nelson, Z., Blundell, C., & 
Hassabis, D. (2019). Reinforcement Learning, Fast and Slow. Trends in Cognitive 
Sciences, 23(5), 408–422. https://doi.org/10.1016/j.tics.2019.02.006
Colas, C., Karch, T., Sigaud, O., & Oudeyer, P.-Y. (2021). Intrinsically 
Motivated Goal-Conditioned Reinforcement Learning: A Short Survey. 
ArXiv:2012.09830 [Cs]. http://arxiv.org/abs/2012.09830
Gupta, A., Savarese, S., Ganguli, S., & Fei-Fei, L. (2021). Embodied 
intelligence via learning and evolution. Nature Communications, 12(1), 5721. 
https://doi.org/10.1038/s41467-021-25874-z
Jaeger, H. (2007). Echo state network. Scholarpedia, 2(9), 2330. 
https://doi.org/10.4249/scholarpedia.2330
Najarro, E., & Risi, S. (2021). Meta-Learning through Hebbian Plasticity in 
Random Networks. ArXiv:2007.02686 [Cs]. http://arxiv.org/abs/2007.02686
Sakemi, Y., Morino, K., Leleu, T., & Aihara, K. (2020). Model-size reduction 
for reservoir computing by concatenating internal states through time. 
Scientific Reports, 10(1), Article 1. https://doi.org/10.1038/s41598-020-78725-0
Seoane, L. F. (2019). Evolutionary aspects of reservoir computing. 
Philosophical Transactions of the Royal Society B: Biological Sciences, 
374(1774), 20180377. https://doi.org/10.1098/rstb.2018.0377
Tang, Y., Tian, Y., & Ha, D. (2022). EvoJAX: Hardware-Accelerated 
Neuroevolution. Proceedings of the Genetic and Evolutionary Computation 
Conference Companion, 308–311. https://doi.org/10.1145/3520304.3528770
Trouvain, N., & Hinaut, X. (2022). reservoirpy: A Simple and Flexible Reservoir 
Computing Tool in Python. https://hal.inria.fr/hal-03699931 
https://github.com/reservoirpy/reservoirpy
Weng, L. (2019, June 23). Meta Reinforcement Learning. 
https://lilianweng.github.io/posts/2019-06-23-meta-rl/


Xavier Hinaut
Inria Research Scientist
www.xavierhinaut.com -- +33 5 33 51 48 01
Mnemosyne team, Inria, Bordeaux, France -- https://team.inria.fr/mnemosyne
& LaBRI, Bordeaux University --  
https://www4.labri.fr/en/formal-methods-and-models
& IMN (Neurodegeneratives Diseases Institute) -- http://www.imn-bordeaux.org/en
---
Our Reservoir Computing library: https://github.com/reservoirpy/reservoirpy

Reply via email to