Hi Vincent

Are you comparing compressed with uncompressed files?

cheers - Barry

On 04/10/16 14:40, Vincent Nguyen wrote:
> Hi,
>
> on this link:
>
> http://www.statmt.org/wmt11/translation-task.html
>
> on the download section for monolingual data, there is :
>
> one big file : http://www.statmt.org/wmt11/training-monolingual.tgz
>
> And separate files, of which news crawls per year.
>
> However, when you take a single file for a specific year, it is not the
> same size as the same name file in the big download.
>
> expanded size for english corpus :
>
> news2008: 4.3GB vs 1.6GB for single download
> news2009: 5.3GB vs 1.8GB for single download
>
> etc...
>
> can someone please explain the difference ?
>
> thanks
>
> Vincent.
>
>
> _______________________________________________
> Moses-support mailing list
> [email protected]
> http://mailman.mit.edu/mailman/listinfo/moses-support
>


-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to