Re: Control over max map/reduce tasks per job

Bryan Duxbury Tue, 03 Feb 2009 22:59:25 -0800

This sounds good enough for a JIRA ticket to me.
-Bryan


On Feb 3, 2009, at 11:44 AM, Jonathan Gray wrote:

Chris,

For my specific use cases, it would be best to be able to set N

mappers/reducers per job per node (so I can explicitly say, run atmost 2 ata time of this CPU bound task on any given node). However, theother waywould work as well (on 10 node system, would set job to max 20tasks at atime globally), but opens up the possibility that a node could beassigned

more than 2 of that task.

I would work with whatever is easiest to implement as either wouldbe a vastimprovement for me (can run high numbers of network latency boundtasks

without fear of cpu bound tasks killing the cluster).

JG

-----Original Message-----
From: Chris K Wensel [mailto:ch...@wensel.net]
Sent: Tuesday, February 03, 2009 11:34 AM
To: core-user@hadoop.apache.org
Subject: Re: Control over max map/reduce tasks per job

Hey Jonathan

Are you looking to limit the total number of concurrent mapper/
reducers a single job can consume cluster wide, or limit the number
per node?

That is, you have X mappers/reducers, but only can allow N mappers/
reducers to run at a time globally, for a given job.

Or, you are cool with all X running concurrently globally, butwant to

guarantee that no node can run more than N tasks from that job?

Or both?

just reconciling the conversation we had last week with this thread.

ckw

On Feb 3, 2009, at 11:16 AM, Jonathan Gray wrote:

All,



I have a few relatively small clusters (5-20 nodes) and am having
trouble
keeping them loaded with my MR jobs.

The primary issue is that I have different jobs that havedrastically

different patterns.  I have jobs that read/write to/from HBase or
Hadoop
with minimal logic (network throughput bound or io bound), others

that

perform crawling (network latency bound), and one huge parsing
streaming job
(very CPU bound, each task eats a core).



I'd like to launch very large numbers of tasks for network latency
bound
jobs, however the large CPU bound job means I have to keep the max
maps
allowed per node low enough as to not starve the Datanode and
Regionserver.



I'm an HBase dev but not familiar enough with Hadoop MR code to even
know
what would be involved with implementing this.  However, in talking
with
other users, it seems like this would be a well-received option.



I wanted to ping the list before filing an issue because it seems

like

someone may have thought about this in the past.



Thanks.



Jonathan Gray


--
Chris K Wensel
ch...@wensel.net
http://www.cascading.org/
http://www.scaleunlimited.com/

Re: Control over max map/reduce tasks per job

Reply via email to