Announcing the WMT20 shared tasks on Unsupervised Machine Translation
and Very Low Resource Supervised Machine Translation.

There is no machine translation available for most of the ~7000
languages spoken on the planet Earth. This is because very limited or
no parallel corpora are available. Research on unsupervised and very
low resource machine translation is important for alleviating this
problem. Unsupervised machine translation requires only monolingual
data, while very low resource supervised machine translation uses very
limited parallel data.

At WMT 2018 and WMT 2019, the first shared task and second shared task
on Unsupervised Machine Translation (UMT), were held as part of the
news translation track. In 2018, the language pairs were
Turkish-English, Estonian-English and German-English. In 2019, we also
tested "simulated" unsupervised systems for German to Czech
unsupervised translation (where no German/Czech parallel data was
allowed).

We now propose a third edition on UMT, which aims at a more realistic
scenario, German to Upper Sorbian (and Upper Sorbian to German)
translation. Upper Sorbian is a minority language of Germany that is
in the Slavic language family (e.g., related to Lower Sorbian, Czech
and Polish), and we provide here most of the digital data that is
available, as far as we know.

As we were very recently able to obtain a very small amount of
parallel data for this language pair, we also offer a very low
resource supervised translation task.

The tasks are:

- Unsupervised Machine Translation: German to Upper Sorbian. Upper Sorbian
to German.

- Very Low Resource Supervised Machine Translation: German to Upper
Sorbian. Upper Sorbian to German.

For further information and train/test data, please see:

http://www.statmt.org/wmt20/unsup_and_very_low_res/

Thanks and kind regards,
Alexander Fraser
CIS, LMU Munich
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to