Dear all,

We have collected a multilingually-aligned parallel corpus of transcripts
of educational videos, and made it available for research purposes. The
corpus consists of 20 different languages and the number of parallel
sentences ranges from 335k to 43k. In addition to the commonly used
language pairs (e.g. English, Spanish, French, German), it also includes
less frequently-used European languages like Portuguese, Bulgarian and
Polish, and many resource-poor non-european languages such as Thai, Korean,
Turkish, Hindi, etc.

For Machine Translation, we have also provided a training set, a
development set and a test set.

The corpus can be found here:

http://alt.qcri.org/resources/qedcorpus/

If you have any questions, feel free to ask.

Best,
Hassan
Arabic Language Technologies - QCRI
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to