too many map tasks, freezing jobTracker?

Brendan W. Tue, 29 Mar 2011 08:31:52 -0700

Hi,

I have a 20-node hadoop cluster, processing large log files.  I've seen it
said that there's never any reason to make the inputSplitSize larger than a
single HDFS block (64M), because you give up data locality for no benefit if
you do.


But when I kick off a job against the whole dataset with that default
splitSize, I get about 180,000 map tasks, most lasting about 9-15 seconds
each.  Typically I can get through about half of them, then the jobTracker
freezes with OOM errors.

I do realize that I could just up the HADOOP_HEAP_SIZE on the jobTracker
node.  But it also seems like we ought to have fewer map tasks, lasting more
like 1 or 1.5 minutes each, to reduce the overhead to the jobTracker of
managing so many tasks...also the overhead to the cluster nodes of starting
and cleaning up after so many child JVMs.

Is that not a compelling reason for upping the inputSplitSize?  Or am I
missing something?

Thanks

too many map tasks, freezing jobTracker?

Reply via email to