[Corpora-List] 2-year Postdoc position at CEA-List (Paris-Saclay University, France): NLP Transformer models for protein sequences

Olivier Ferret via Corpora Fri, 13 Jan 2023 09:13:08 -0800

CEA List, a research institute of Paris-Saclay University, is looking for a 
Postdoctoral Fellow to join its laboratory of semantic analysis of texts and 
images.


In the context of the DeepGenSeq project, the person hired will integrate an 
interdisciplinary team aiming to move closer to the goal of predictive and 
generative artificial intelligence for biology by exploiting deep contextual 
language models of biological sequences, which representations generalize to 
several applications like the prediction of mutational effects.

BACKGROUND
Exponential growth in sequencing throughput together with the sampling of 
natural (uncultured) populations are providing a deeper view of the diversity 
of proteins sequences across the tree of life. Proteins are molecular engines 
sustaining cellular life and the unobserved determinants of their structure and 
function are encoded in the distribution of observed natural sequences. 
Therefore, such vast amounts of (unlabeled) sequences provide evolutionary data 
that can form the ground for unsupervised learning of predictive and generative 
models of biological function.

Recent advances in machine learning, with the development of the transformer 
architecture, have allowed the emergence of powerful language models that can 
be used to model proteins sequences. Through transfer learning, the learned 
representations can be used to detect homology (i.e. the relatedness between 
two protein sequences), predict secondary and tertiary structures, predict 
residue-residue contacts or predict fluorescence landscape.

CHALLENGES AND OBJECTIVES
Our focus here will be to develop high-capacity transformer-based language 
models on protein sequence data. Intrinsic organising principles captured in 
the resulting representations can then be applied in transfer learning settings 
to different predictive sub-tasks using limited experimental data (e.g. the 
effect of sequence variation on protein function). Following promising recent 
results, we plan to also explore zero-shot inference with no additional 
training and/or supervision from experimental data.

Responsibilities:
* Tune and optimize existing unsupervised transformer-based language models for 
protein sequences.
* Develop and optimize code and machine learning algorithms for predictive 
models.
* Integrate and analyze large data volumes.
* Interact continuously with scientists in an interdisciplinary team.

APPLICATION
This project will be an excellent opportunity for a candidate who is looking to 
contribute to cutting-edge research and to train with experts in the field. We 
are seeking here a detail-oriented computer scientist and problem solver 
passionate in science. This 2 years position is open to a range of candidates 
from recent college graduates to more experienced scientists (e.g. post-docs) 
The ideal candidate should have the following qualifications:

* Ph.D. or M.Sc. in Applied Mathematics, Computer Science, or Computational 
Biology.
* Experience in Deep Learning methods.
* Experience with Python, open-source software libraries for machine learning 
and Linux.
* Strong mathematical background and analytical skills.
* Effective organizational skills, e.g. the ability to prioritize work and 
contribute to the planning of a program of scientific research.
* Demonstrated interpersonal skills including both the ability to work 
independently and perform collaborative research in an interdisciplinary team 
environment.
* Good oral and written communication skills.

Preferred: Previous experience with transformer-based techniques for NLP 
pre-training and transformer language models

TERMS & COMPENSATION
This 2 years position is open to a range of candidates from recent college 
graduates to more experienced scientists (e.g. post-docs) – the chosen 
candidate's salary will be commensurate with their level of education, skills, 
and experience. Other benefits include:
- 48 days of paid holidays
- on-site subsidized restaurant
- partial remote work is possible, up to 3 days per week within the limit of 
100 days per year
- CEA contribution to the personal company savings plan

LOCATION
We are based on the Paris-Saclay research campus in the south of Paris, France.

HOW TO APPLY
Interested candidates should submit a resume and short cover letter to 
deepgenseq «at» saxifrage.saclay.cea.fr

ABOUT US
About CEA-List: https://list.cea.fr/en/

About the LASTI lab: https://kalisteo.cea.fr/index.php/ai/
                https://kalisteo.cea.fr/index.php/textual-and-visual-semantic/

About Genoscope: https://www.genoscope.cns.fr
_______________________________________________
Corpora mailing list -- [email protected]
https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/
To unsubscribe send an email to [email protected]

[Corpora-List] 2-year Postdoc position at CEA-List (Paris-Saclay University, France): NLP Transformer models for protein sequences

Reply via email to