[Moses-support] 2-year Postdoc position in Automatic Speech Recognition at University of Bordeaux

Georgeta Bordea Wed, 25 Nov 2020 10:13:14 -0800

The University of Bordeaux invites applications for a 2 year full-time 
postdoctoral researcher in Automatic Speech Recognition. The position is part 
of the FVLLMONTI project on efficient speech-to-speech translation for embedded 
autonomous devices, funded by the European Community.


To apply, please send by email a single PDF file containing a full CV 
(including publication list), cover letter (describing your personal 
qualifications, research interests and motivation for applying), evidence for 
software development experience (active Github/Gitlab profile or similar), two 
of your key publications, contact information of two referees and academic 
certificates (PhD, Diploma/Master, Bachelor certificates).

Details on the position are given below:

Job description: Post-doctoral position in Automatic Speech Recognition 
Duration: 24 months
Starting date: as early as possible (from March 1st 2021)
Project: European FETPROACT project FVLLMONTI (starts January 2021)
Location: Bordeaux Computer Science Lab. (LaBRI CNRS UMR 5800), Bordeaux, 
France (Image and Sound team)
Salary: from 2 086,45EUR to 2 304,88EUR/month (estimated net salary after 
taxes, according to experience)

Contact: [email protected]

Short description:
The applicant will be in charge of developing state-of-the-art Automatic Speech 
Recognition systems for English and French as well as related Machine 
Translation systems using Deep Neural Networks. The objective is to provide the 
exact specifications of the designed systems to the other partners of the 
project specialized in hardware. Adjustments will have to be made to take into 
account the hardware constraints (i.e. memory and energy consumption impacting 
the number of parameters, computation time, ...) while keeping an eye on 
performance metrics (WER and BLEU scores). When a satisfactory trade-off is 
reached, more exploratory work is to be carried out on using 
emotion/attitude/affect recognition on the speech samples to supply additional 
information to the translation system. 


Context of the project:
The aim of the FVLLMONTI project is to build a lightweight autonomous in-ear 
device allowing speech-to-speech translation. Today, pocket-talk devices 
integrate IoT products requiring internet connectivity which, in general, is 
proven to be energy inefficient. While machine translation (MT) and Natural 
Language Processing (NLP) performances have greatly improved, an embedded 
lightweight energy-efficient hardware remains elusive. Existing solutions based 
on artificial neural networks (NNs) are computation-intensive and energy-hungry 
requiring server-based implementations, which also raises data protection and 
privacy concerns. Today, 2D electronic architectures suffer from "unscalable" 
interconnect and are thus still far from being able to compete with biological 
neural systems in terms of real-time information-processing capabilities with 
comparable energy consumption. Recent advances in materials science, device 
technology and synaptic architectures have the potential to fill this gap with 
novel disruptive technologies that go beyond conventional CMOS technology. A 
promising solution comes from vertical nanowire field-effect transistors 
(VNWFETs) to unlock the full potential of truly unconventional 3D circuit 
density and performance.

Role:
The tasks assigned to the Computer Science lab are the design of the Automatic 
Speech Recognition (for French and English) and the Machine Translation 
(English to French and French to English) systems. Speech synthesis will not be 
explored in the project but an open-source implementation will be used for 
demonstration purposes. Both ASR and MT tasks benefit from the use of 
Transformer architectures over Convolutional (CNNs) or Recurrent (RNNs) neural 
network architectures. Thus, the role of the applicant will be to design and 
implement state-of-the-art systems for ASR using Transformer networks (e.g. 
with the ESPNET toolkit) and to assist another post-doctorate for the MT 
systems. Once the performances reached by these baseline systems are 
satisfactory, details on the network will be given to our hardware designers 
partners (e.g. number of layers, value of the parameters, etc.). With the 
feedback of these partners, adjustments will be made to the network considering 
the hardware constraints while trying not to degrade the performances too much.

The second part of the project will focus on keeping up with the latest 
innovations and translating them into hardware specifications. For example, 
recent research suggest that adding convolutional layers to the transformer 
architecture (i.e. the "conformer" network) can help reduce the number of 
parameters of the model which is critical regarding the memory usage of the 
hardware system.

Finally, more exploratory work on the detection of social affects (i.e. the 
vocal expression of the intent of the speaker: 'politeness', 'irony', etc) will 
be carried out. The additional information gathered using this detection will 
be added to the translation system for potential usage in the future speech 
synthesis system. 

Required skills:
- PhD in Automatic Speech Recognition (preferred) or Machine Translation using 
deep neural networks
- Knowledge of most widely used toolboxes/frameworks (tensorflow, pytorch, 
espnet for example)
- Good programming skills (python)
- Good communication skills (frequent interactions with hardware specialists)
- Interest in hardware design will be a plus

Selected references:
S. Karita et al., "A Comparative Study on Transformer vs RNN in Speech 
Applications," 2019 IEEE Automatic Speech Recognition and Understanding 
Workshop (ASRU), SG, Singapore, 2019, pp. 449-456, doi: 
10.1109/ASRU46091.2019.9003750.
Gulati, Anmol, et al. "Conformer: Convolution-augmented Transformer for Speech 
Recognition." arXiv preprint arXiv:2005.08100 (2020).
Rouas, Jean-Luc, et al. "Categorisation of spoken social affects in Japanese: 
human vs. machine." ICPhS. 2019.

_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

[Moses-support] 2-year Postdoc position in Automatic Speech Recognition at University of Bordeaux

Reply via email to