(apologies for cross-posting) We are happy to report a new release (v1.1) of the Europarl-ST corpus, a Speech Translation Corpus of parliamentary debates, consisting in audio-transcription-translation triples.
This release adds 3 new languages (Romanian, Polish and Dutch). Jointly with the already available 6 languages (German, English, Spanish, French, Italian and Portuguese), the corpus now offers 72 speech translation directions. We have also released a new set, train-noisy, which contains the speeches that were discarded during our filtering process, as they may still be useful for some training regimes. Full details and download links are available at the corpus webpage: https://mllp.upv.es/europarl-st/ Best regards, Javier Iranzo-Sánchez PhD Student MLLP-VRAIN, UPV Valencia
_______________________________________________ Mt-list site list Mt-list@eamt.org http://lists.eamt.org/mailman/listinfo/mt-list