Re: [Moses-support] Multilingually Sentence-Aligned Corpora

Marcin Junczys-Dowmunt Fri, 22 Jan 2016 07:41:30 -0800

Hi Graham,

At the UN we are now working to release an official version of our data.As a bonus to the pair-wise alignment, it will contain a 6-way fullyaligned subcorpus for English, French, Spanish, Russian, Chinese,Arabic; about 13M segments per language. We are waiting for some LRECfeedback and the official greenlight from UN officials, but that shouldbe a matter of a couple of weeks now (maybe one, maybe two, maybe four).Once it is ready I can make an announcement here.

Best,
Marcin


W dniu 22.01.2016 o 16:26, Graham Neubig pisze:

Dear Moses Mailing List,
This is not directly related to Moses, but I was wondering if thereare any high-quality, multi-lingually sentence aligned corporaavailable (i.e. 3 or more languages with aligned sentences). We'reaware of the Europarl and Bible corpora, but Europarl only coversEuropean languages, and the Bible corpus is quite small in MT terms.
TED and MULTI-UN are options, but as far as I know the data is onlybilingually aligned at the moment, and it can be a bit hard to get aclean multi-lingual corpus from them. If anyone has any experiencewith this, or resource available, I'd love some info.
Thanks in advance,
Graham


_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] Multilingually Sentence-Aligned Corpora

Reply via email to