Re: Control over max map/reduce tasks per job

Chris K Wensel Tue, 03 Feb 2009 11:34:19 -0800

Hey Jonathan

Are you looking to limit the total number of concurrent mapper/reducers a single job can consume cluster wide, or limit the numberper node?

That is, you have X mappers/reducers, but only can allow N mappers/reducers to run at a time globally, for a given job.

Or, you are cool with all X running concurrently globally, but want toguarantee that no node can run more than N tasks from that job?


Or both?

just reconciling the conversation we had last week with this thread.

ckw

On Feb 3, 2009, at 11:16 AM, Jonathan Gray wrote:

All,
I have a few relatively small clusters (5-20 nodes) and am havingtrouble
keeping them loaded with my MR jobs.



The primary issue is that I have different jobs that have drastically
different patterns. I have jobs that read/write to/from HBase orHadoop
with minimal logic (network throughput bound or io bound), others that
perform crawling (network latency bound), and one huge parsingstreaming job
(very CPU bound, each task eats a core).
I'd like to launch very large numbers of tasks for network latencyboundjobs, however the large CPU bound job means I have to keep the maxmapsallowed per node low enough as to not starve the Datanode andRegionserver.
I'm an HBase dev but not familiar enough with Hadoop MR code to evenknowwhat would be involved with implementing this. However, in talkingwith
other users, it seems like this would be a well-received option.



I wanted to ping the list before filing an issue because it seems like
someone may have thought about this in the past.



Thanks.



Jonathan Gray


--
Chris K Wensel
ch...@wensel.net
http://www.cascading.org/
http://www.scaleunlimited.com/

Re: Control over max map/reduce tasks per job

Reply via email to