yes. but with 128M gzip files/block size the M/R will work better ? no ?

anyhow, thanks for the useful information.

On Thu, Mar 17, 2011 at 5:07 PM, Harsh J <qwertyman...@gmail.com> wrote:

> On Thu, Mar 17, 2011 at 7:51 PM, Lior Schachter <li...@infolinks.com>
> wrote:
> > Currently each gzip file is about 250MB (*60files=15G) so we have 256M
> > blocks.
>
> Darn, I ought to sleep a bit more. I did a file/gb and read it as gb/file
> mehh..
>
> >
> > However I understand that in order to utilize better M/R parallel
> processing
> > smaller files/blocks are better.
>
> Yes this is true in case of text/sequence files.
>
> > So maybe having 128M gzip files with coreesponding 128M block size would
> be
> > better?
>
> Why not 256 for all your ~250MB _gzip_ files, making it nearly one
> block since they would not be split anyways?
>
> --
> Harsh J
> http://harshj.com
>

Reply via email to