Dear moses-list, We make the English, Czech, Finnish, German, Latvian, Romanian, Russian, Turkish, and Chinese datasets used for WMT'16 (http://www.statmt.org/wmt16/ <http://www.statmt.org/wmt15/>) and WMT'17 (http://www.statmt.org/wmt17/ <http://www.statmt.org/wmt15/>) translation task when building ParFDA Moses SMT models available on the web, downloadable from:
[WMT'17] https://drive.google.com/drive/folders/0B2k8ISN7gmi1SnA1d1gxcTQ5TTg?usp=sharing [WMT'16] https://drive.google.com/drive/folders/0B2k8ISN7gmi 1NHNTSGFrMGhfaVU?usp=sharing WMT'16 results are in the following paper: Ergun Bicici. *ParFDA for Instance Selection for Statistical Machine Translation*. In *Proc. of the First Conference on Statistical Machine Translation (WMT16)*, Berlin, Germany, 8 2016. Association for Computational Linguistics. The datasets are selected by ParFDA for WMT'16 and WMT'17 translation tasks from among the pool of sentences made available by the WMT organization and ParFDA Moses SMT results can serve as a benchmark for SMT research. Language model corpora used contain ~15M sentences and language models were built using kenlm (https://kheafield.com/code/kenlm/). LICENSE Note: BSD license. We also inherit characteristics of the license of WMT conference organization, which allows the use for research purposes, to make the datasets available. ParFDA WMT SMT datasets: - ParFDA WMT'17 Datasets (https://github.com/bicici/ParFDAWMT17) - ParFDA WMT'16 Datasets (https://github.com/bicici/ParFDAWMT16) - ParFDA WMT'15 Datasets (https://github.com/bicici/ParFDAWMT15) - ParFDA WMT'14 Datasets (https://github.com/bicici/ParFDAWMT14) Best Regards, Ergun TUBITAK BILGEM B3LAB Cloud Computing Laboratory bicici.github.com
_______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
