Re: Are hadoop fs commands serial or parallel

Dieter Plaetinck Mon, 23 May 2011 03:13:06 -0700

On Fri, 20 May 2011 10:11:13 -0500
Brian Bockelman <[email protected]> wrote:

> 
> On May 20, 2011, at 6:10 AM, Dieter Plaetinck wrote:
> 
> > What do you mean clunky?
> > IMHO this is a quite elegant, simple, working solution.
> 
> Try giving it to a user; watch them feed it a list of 10,000 files;
> watch the machine swap to death and the disks uselessly thrash.
> 
> > Sure this spawns multiple processes, but it beats any
> > api-overcomplications, imho.
> > 
> 
> Simple doesn't imply scalable, unfortunately.
> 
> Brian

True, I assumed if anyone wants this, he knows what he's doing (i.e.
the files could be small and already in the Linux block cache).
Because why would anyone read files in parrallel if that causes disk
seeks all over the place? Ideally, you should tune for 1 sequential read
per disk at the time. In that respect, I definitely agree that some
clever logic in userspace to optimize disk reads (across a bunch of
different possible hardware setups) would be beneficial.

Dieter

Re: Are hadoop fs commands serial or parallel

Reply via email to