On Thu, Mar 17, 2011 at 7:51 PM, Lior Schachter <li...@infolinks.com> wrote: > Currently each gzip file is about 250MB (*60files=15G) so we have 256M > blocks.
Darn, I ought to sleep a bit more. I did a file/gb and read it as gb/file mehh.. > > However I understand that in order to utilize better M/R parallel processing > smaller files/blocks are better. Yes this is true in case of text/sequence files. > So maybe having 128M gzip files with coreesponding 128M block size would be > better? Why not 256 for all your ~250MB _gzip_ files, making it nearly one block since they would not be split anyways? -- Harsh J http://harshj.com