Dear colleagues,

Please find hereafter an internship proposal.
Feel free to transfer it to your M2 students.

Kind regards

----------------------------------------------------------------------------------------------------------------------------------
The LIG (Laboratoire d'Informatique de Grenoble) proposes the following Master 
2 level internship:

Title: Context-Aware Neural Machine Translation Evaluation

Description:
Context-Aware Neural Machine Translation  (CA-NMT) [Tiedemann and Scherrer, 
2017; Laubli et al., 2018; Miculicich et al., 2018; Maruf et al., 2019; Zheng 
et al., 2020; Ma et al. 2021; Lupo et al., 2022] is currently one of the main 
research axes in NLP, with strong impact on both academic and company research.
CA-NMT systems are evaluated with both "average-quality-measuring" metrics such 
as BLEU [Papineni et al., 2002], and dedicated contrastive test suites [Voita 
et al., 2019; Muller&Rios 2018; Lopes et al., 2020].
The latter have been designed to measure specifically to which degree CA-NMT 
systems are able to exploit context while scoring sentences to be translated in 
context. Indeed the average translation quality measured by BLEU has been shown 
inadequate in this respect [Lupo et al., 2022].
When evaluating models with contrastive test suites however, models are only 
asked to score sentences and not to translate them. The ability of models to 
use context is thus only implicitly evaluated.
With the work planned in this internship we would like to make a step ahead in 
the evaluation of CA-NMT systems.
The idea is to exploit annotated data like those already used for [Muller&Rios 
2018; Lopes et al., 2020] to explicitly involve discourse phenomena, such like 
coreferences and anaphora, in the evaluation procedure of CA-NMT models.
Such evaluation procedure will allow possibly to design more accurate and 
adequate evaluation measures for "discourse-phenomena-aware" CA-NMT systems.

Practical Aspects:
In this internship the student will use Machine Learning and Deep Learning 
tools to automatically annotate parallel data (at least English-French, but 
possibly also English-German and other language pairs) used for NMT with 
discourse phenomena, as well as Neural Machine Translation tools for 
automatically generating translations that will be used for CA-NMT evaluation.
Based on the annotation of discourse phenomena, we will design an adequate 
evaluation metric for CA-NMT systems, taking into account the capability of the 
system to exploit discourse phenomena. Finally, the evaluation metric will be 
tested by evaluating CA-NMT systems already available or trained from scratch 
at LIG by the student.

Profile:
Master 2 student level in computer science or NLP
Interested in Natural Language Processing and Deep Learning approaches
Skills in machine learning for probabilistic models
Computer science skills:
Python programming. Some knowledge of deep learning libraries such like Pytorch 
(Fairseq would be a plus).
Data manipulation and annotation

The internship may last from 5 up to 6 months, it will take place at LIG 
laboratory, GETALP team (http://lig-getalp.imag.fr/ 
<http://lig-getalp.imag.fr/>), starting from January/February 2022.
The student will be tutored by Marco Dinarelli (http://www.marcodinarelli.it 
<http://www.marcodinarelli.it/>), andEmmanuelle Esperança-Rodier 
(https://lig-membres.imag.fr/esperane/ <https://lig-membres.imag.fr/esperane/>)
Interested candidates must send a CV and a motivation letter to (both adresses) 
[email protected] 
<mailto:[email protected]>, 
[email protected] 
<mailto:[email protected]>.


[Tiedemann and Scherrer, 2017] Neural ma- chine translation with extended 
context. Workshop on Discourse in Machine Translation 2017.
[Laubli et al., 2018] Has machine translation achieved human parity? a case for 
document-level evaluation. EMNLP 2018.
[Miculicich et al. 2018] Document-level neural machine translation with 
hierarchical attention networks. EMNLP 2018.
[Maruf et al., 2019] Selective attention for context-aware neural machine 
translation. NAACL 2019.
[Zheng et al., 2020] Towards Making the Most of Context in Neural Machine 
Translation. IJCAI 2020.
[Ma et al., 2021]  A Comparison of Approaches to Document-level Machine 
Translation. arXiv pre-print 2021.
[Lupo et al., 2022] Divide and Rule: Effective Pre-Training for Context-Aware 
Multi-Encoder Translation Models. ACL 2022.
[Papineni et al., 2022] Bleu: a method for automatic eval- uation of machine 
translation. ACL 2002.
[Voita et al., 2019] "When a good translation is wrong in context: 
Context-aware machine translation improves on deixis, ellipsis, and lexical 
cohesion". ACL 2019.
[Muller&Rios 2018] "A large-scale test set for the evaluation of context-aware 
pronoun translation in neural machine translation." CMT 2018
[Lopes et al., 2020] "Document-level neural MT: A systematic comparison". EAMT 
2020
----------------------------------------------------------------------------------------------------------------------------------
___________________________________________
Emmanuelle Esperança-Rodier
Enseignante-Chercheuse en Linguistique Informatique (Section 7)
Maîtresse de Conférences - Hors Classe

UMR 5217 - LIG (Laboratoire d’Informatique de Grenoble)
GETALP (Groupe d’Étude en Traduction Automatique/Traitement Automatisé des 
Langues et de la Parole)
Bâtiment IMAG - 700 avenue Centrale - Domaine Universitaire de 
Saint-Martin-d’Hères
04 57 42 14 92

Service des Langues UGA
Coordinatrice des enseignements d’anglais pour la composante IM2AG - 
Mathématiques



 

_______________________________________________
Corpora mailing list -- [email protected]
https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/
To unsubscribe send an email to [email protected]

Reply via email to