Dear all, We have collected a multilingually-aligned parallel corpus of transcripts of educational videos, and made it available for research purposes. The corpus consists of 20 different languages and the number of parallel sentences ranges from 335k to 43k. In addition to the commonly used language pairs (e.g. English, Spanish, French, German), it also includes less frequently-used European languages like Portuguese, Bulgarian and Polish, and many resource-poor non-european languages such as Thai, Korean, Turkish, Hindi, etc.
For Machine Translation, we have also provided a training set, a development set and a test set. The corpus can be found here: http://alt.qcri.org/resources/qedcorpus/ If you have any questions, feel free to ask. Best, Hassan Arabic Language Technologies - QCRI
_______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
