RE: Control over max map/reduce tasks per job

Jonathan Gray Tue, 03 Feb 2009 11:42:59 -0800

Chris,

For my specific use cases, it would be best to be able to set N
mappers/reducers per job per node (so I can explicitly say, run at most 2 at
a time of this CPU bound task on any given node).  However, the other way
would work as well (on 10 node system, would set job to max 20 tasks at a
time globally), but opens up the possibility that a node could be assigned
more than 2 of that task.


I would work with whatever is easiest to implement as either would be a vast
improvement for me (can run high numbers of network latency bound tasks
without fear of cpu bound tasks killing the cluster).

JG



> -----Original Message-----
> From: Chris K Wensel [mailto:ch...@wensel.net]
> Sent: Tuesday, February 03, 2009 11:34 AM
> To: core-user@hadoop.apache.org
> Subject: Re: Control over max map/reduce tasks per job
> 
> Hey Jonathan
> 
> Are you looking to limit the total number of concurrent mapper/
> reducers a single job can consume cluster wide, or limit the number
> per node?
> 
> That is, you have X mappers/reducers, but only can allow N mappers/
> reducers to run at a time globally, for a given job.
> 
> Or, you are cool with all X running concurrently globally, but want to
> guarantee that no node can run more than N tasks from that job?
> 
> Or both?
> 
> just reconciling the conversation we had last week with this thread.
> 
> ckw
> 
> On Feb 3, 2009, at 11:16 AM, Jonathan Gray wrote:
> 
> > All,
> >
> >
> >
> > I have a few relatively small clusters (5-20 nodes) and am having
> > trouble
> > keeping them loaded with my MR jobs.
> >
> >
> >
> > The primary issue is that I have different jobs that have drastically
> > different patterns.  I have jobs that read/write to/from HBase or
> > Hadoop
> > with minimal logic (network throughput bound or io bound), others
> that
> > perform crawling (network latency bound), and one huge parsing
> > streaming job
> > (very CPU bound, each task eats a core).
> >
> >
> >
> > I'd like to launch very large numbers of tasks for network latency
> > bound
> > jobs, however the large CPU bound job means I have to keep the max
> > maps
> > allowed per node low enough as to not starve the Datanode and
> > Regionserver.
> >
> >
> >
> > I'm an HBase dev but not familiar enough with Hadoop MR code to even
> > know
> > what would be involved with implementing this.  However, in talking
> > with
> > other users, it seems like this would be a well-received option.
> >
> >
> >
> > I wanted to ping the list before filing an issue because it seems
> like
> > someone may have thought about this in the past.
> >
> >
> >
> > Thanks.
> >
> >
> >
> > Jonathan Gray
> >
> 
> --
> Chris K Wensel
> ch...@wensel.net
> http://www.cascading.org/
> http://www.scaleunlimited.com/

RE: Control over max map/reduce tasks per job

Reply via email to