Call for Participation

Similar Language Translation Task at WMT 2020 (co-located with EMNLP 2020)
URL: http://www.statmt.org/wmt20/similar.html

The training/dev sets are available. The test data will be released on June 8, 
2020.

Please visit the website for more information.

Task Description

Within the MT and NLP communities, English is by far the most resource-rich 
language. MT systems are most often trained to translate texts from and to 
English or they use English as a pivot language to translate between 
resource-poorer languages. The interest in English is reflected, for example, 
in the WMT translation tasks (e.g. News, Biomedical) which have always included 
language pairs in which texts are translated to and/or from English. With the 
widespread use of MT technology, there is more and more interest in training 
systems to translate between languages other than English. One evidence of this 
is the need of directly translating between pairs of similar languages. The 
main challenge here is how to take advantage of the similarity between 
languages to overcome the limitation given the low amount of available parallel 
data to produce an accurate output.

Given the interest of the community in this topic we organize, for the second 
time at WMT, the shared task on "Similar Language Translation" to evaluate the 
performance of state-of-the-art translation systems on translating between 
pairs of languages from the same language family. This year we provide 
participants with training and testing data in five language pairs from three 
language families listed below. Evaluation will be carried out using automatic 
evaluation metrics and human evaluation.

Language Pairs

This year we have five pairs of similar languages from three different language 
families: Indo-Aryan, Romance, and South-Slavic. Translations will be evaluated 
in both directions (e.g. from Spanish to Catalan and from Catalan to Spanish).

- Indo-Aryan Languages: Hindi - Marathi
- Romance Languages: Spanish - Catalan and Spanish - Portuguese
- South-Slavic Languages: Slovene - Croatian and Slovene - Serbian

Organizers

Marta Costa-jussà, Universitat Politècnica de Catalunya
Magdalena Biesialska, Universitat Politècnica de Catalunya
Santanu Pal, Wipro AI Lab
Nikola Ljubešić, Jožef Stefan Institute and University of Zagreb
Marcos Zampieri, Rochester Institute of Technology​

Contact

martaruizcostajussa(at)gmail.com
_______________________________________________
Mt-list site list
[email protected]
http://lists.eamt.org/mailman/listinfo/mt-list

Reply via email to