Hi,

on this link:

http://www.statmt.org/wmt11/translation-task.html

on the download section for monolingual data, there is :

one big file : http://www.statmt.org/wmt11/training-monolingual.tgz

And separate files, of which news crawls per year.

However, when you take a single file for a specific year, it is not the 
same size as the same name file in the big download.

expanded size for english corpus :

news2008: 4.3GB vs 1.6GB for single download
news2009: 5.3GB vs 1.8GB for single download

etc...

can someone please explain the difference ?

thanks

Vincent.


_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to