Hi Barry Good job. For some language pairs below 10k, it's quite appealing BLEU scores as reported.
Best Regards Doren On Wednesday, January 29, 2020, Barry Haddow <[email protected]> wrote: > Hi All > > We have released a new sentence aligned corpora pairing English with 13 > different languages spoken in India. Up to 56k sentence pairs are > available for each pair. The languages of India contained in the corpora > are Assamese, Bengali, Gujarati, Hindi, Kannada, Malayalam, Manipuri, > Marathi, Odia, Punjabi, Tamil, Telugu and Urdu. We also provide a larger > version of the corpus, document-aligned only. > > The corpus is available here: http://data.statmt.org/pmindia/ > > There is an accompanying paper which describes the construction of the > corpus, a comparison of alignment methods, and some initial MT results. > > https://arxiv.org/abs/2001.09907 > > > Barry Haddow and Faheem Kirefu > > > > > -- > The University of Edinburgh is a charitable body, registered in > Scotland, with registration number SC005336. > > _______________________________________________ > Moses-support mailing list > [email protected] > http://mailman.mit.edu/mailman/listinfo/moses-support >
_______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
