Hi Graham,
At the UN we are now working to release an official version of our data.
As a bonus to the pair-wise alignment, it will contain a 6-way fully
aligned subcorpus for English, French, Spanish, Russian, Chinese,
Arabic; about 13M segments per language. We are waiting for some LREC
feedback and the official greenlight from UN officials, but that should
be a matter of a couple of weeks now (maybe one, maybe two, maybe four).
Once it is ready I can make an announcement here.
Best,
Marcin
W dniu 22.01.2016 o 16:26, Graham Neubig pisze:
Dear Moses Mailing List,
This is not directly related to Moses, but I was wondering if there
are any high-quality, multi-lingually sentence aligned corpora
available (i.e. 3 or more languages with aligned sentences). We're
aware of the Europarl and Bible corpora, but Europarl only covers
European languages, and the Bible corpus is quite small in MT terms.
TED and MULTI-UN are options, but as far as I know the data is only
bilingually aligned at the moment, and it can be a bit hard to get a
clean multi-lingual corpus from them. If anyone has any experience
with this, or resource available, I'd love some info.
Thanks in advance,
Graham
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support