Hi Vincent Are you comparing compressed with uncompressed files?
cheers - Barry On 04/10/16 14:40, Vincent Nguyen wrote: > Hi, > > on this link: > > http://www.statmt.org/wmt11/translation-task.html > > on the download section for monolingual data, there is : > > one big file : http://www.statmt.org/wmt11/training-monolingual.tgz > > And separate files, of which news crawls per year. > > However, when you take a single file for a specific year, it is not the > same size as the same name file in the big download. > > expanded size for english corpus : > > news2008: 4.3GB vs 1.6GB for single download > news2009: 5.3GB vs 1.8GB for single download > > etc... > > can someone please explain the difference ? > > thanks > > Vincent. > > > _______________________________________________ > Moses-support mailing list > [email protected] > http://mailman.mit.edu/mailman/listinfo/moses-support > -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. _______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
