Hi, on this link:
http://www.statmt.org/wmt11/translation-task.html on the download section for monolingual data, there is : one big file : http://www.statmt.org/wmt11/training-monolingual.tgz And separate files, of which news crawls per year. However, when you take a single file for a specific year, it is not the same size as the same name file in the big download. expanded size for english corpus : news2008: 4.3GB vs 1.6GB for single download news2009: 5.3GB vs 1.8GB for single download etc... can someone please explain the difference ? thanks Vincent. _______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
