Hi Vincent Could you say exactly which files you are comparing?
cheers - Barry On 04/10/16 21:20, Vincent Nguyen wrote: > > no.... but my mistake I was comparing with that link for the per year > files : http://www.statmt.org/wmt15/translation-task.html > > what is the difference ? (with the wmt11 files) > > > > Le 04/10/2016 à 21:46, Barry Haddow a écrit : >> Hi Vincent >> >> Are you comparing compressed with uncompressed files? >> >> cheers - Barry >> >> On 04/10/16 14:40, Vincent Nguyen wrote: >>> Hi, >>> >>> on this link: >>> >>> http://www.statmt.org/wmt11/translation-task.html >>> >>> on the download section for monolingual data, there is : >>> >>> one big file : http://www.statmt.org/wmt11/training-monolingual.tgz >>> >>> And separate files, of which news crawls per year. >>> >>> However, when you take a single file for a specific year, it is not the >>> same size as the same name file in the big download. >>> >>> expanded size for english corpus : >>> >>> news2008: 4.3GB vs 1.6GB for single download >>> news2009: 5.3GB vs 1.8GB for single download >>> >>> etc... >>> >>> can someone please explain the difference ? >>> >>> thanks >>> >>> Vincent. >>> >>> >>> _______________________________________________ >>> Moses-support mailing list >>> [email protected] >>> http://mailman.mit.edu/mailman/listinfo/moses-support >>> >> >> > > -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. _______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
