Final Call for ParticipationWMT 2020 Shared TaskParallel Corpus Filtering and
Alignment for Low-Resource Conditions
Deadline: Saturday, August 1, 2020

We announce and call for participation in the WMT 2020 shared task on
assessing the quality of sentence pairs in a parallel corpus.

   - In the WMT18 shared task on parallel corpus filtering
   <>, we posed
   the challenge of a noisy web-crawled parallel corpus for German-English
   and asked participants to score each sentence pair. These quality scores
   were used to select subsets of the corpus, consisting of the
   highest-scoring sentence pairs, train statistical and neural machine
   translation systems on them, and evaluate these on a set of test sets.
   - In the WMT19 shared task on parallel corpus filtering for low resource
   conditions <>,
   we followed the same protocol, but this time for Nepali-English and
   Sinhala-English. For low-resource language pairs like these, both existing
   clean parallel corpora and the to-be-scored noisy web-crawled data comes
   in smaller amounts and lower quality.

This year, we pose two different language pairs, Khmer-English and
Pashto-English. In addition to the task of computing quality scores for the
purpose of filtering, we also allow for the re-alignment of sentence pairs
from document pairs.
Submission deadline for subsampled sets    August 1, 2020
System descriptions due August 15, 2020
Announcement of results August 29, 2020
Paper notification September 29, 2020
Camera-ready for system descriptions October 10, 2020
Moses-support mailing list

Reply via email to