Re: How do I increase mapper granularity?

Harsh J Tue, 29 Mar 2011 11:33:09 -0700

Hello,

On Tue, Mar 29, 2011 at 11:48 PM, W.P. McNeill <[email protected]> wrote:
>   2. Decrease the block size of my input files.
> do I have to do a distcp with a non-default block
> size.  (I think the answer is that I have to do a distcp, but I'm making
> sure.)


A distcp or even a plain "-cp" with a proper -Ddfs.blocksize=<size>
parameter passed along should do the trick.

> Are there other approaches?

You can have a look at schedulers that guarantee resources to a
submitted job, perhaps?

> Are there other gotchas that come with trying
> to increase mapper granularity.

One thing that comes to my mind is that the more your splits are
(a.k.a. # of tasks), the more meta info the JobTracker has to hold and
maintain upon in its memory. Second, your NameNode also needs to hold
higher amount of bytes in memory for every such granulared set of
files (since lowering block sizes would lead to a LOT more block info
and replica locations to keep track of).

-- 
Harsh J
http://harshj.com

Re: How do I increase mapper granularity?

Reply via email to