RE: Control over max map/reduce tasks per job

Jonathan Gray Wed, 04 Feb 2009 13:37:37 -0800

I have filed an issue for this:

https://issues.apache.org/jira/browse/HADOOP-5170


JG

> -----Original Message-----
> From: Bryan Duxbury [mailto:br...@rapleaf.com]
> Sent: Tuesday, February 03, 2009 10:59 PM
> To: core-user@hadoop.apache.org
> Subject: Re: Control over max map/reduce tasks per job
> 
> This sounds good enough for a JIRA ticket to me.
> -Bryan
> 
> On Feb 3, 2009, at 11:44 AM, Jonathan Gray wrote:
> 
> > Chris,
> >
> > For my specific use cases, it would be best to be able to set N
> > mappers/reducers per job per node (so I can explicitly say, run at
> > most 2 at
> > a time of this CPU bound task on any given node).  However, the
> > other way
> > would work as well (on 10 node system, would set job to max 20
> > tasks at a
> > time globally), but opens up the possibility that a node could be
> > assigned
> > more than 2 of that task.
> >
> > I would work with whatever is easiest to implement as either would
> > be a vast
> > improvement for me (can run high numbers of network latency bound
> > tasks
> > without fear of cpu bound tasks killing the cluster).
> >
> > JG
> >
> >
> >
> >> -----Original Message-----
> >> From: Chris K Wensel [mailto:ch...@wensel.net]
> >> Sent: Tuesday, February 03, 2009 11:34 AM
> >> To: core-user@hadoop.apache.org
> >> Subject: Re: Control over max map/reduce tasks per job
> >>
> >> Hey Jonathan
> >>
> >> Are you looking to limit the total number of concurrent mapper/
> >> reducers a single job can consume cluster wide, or limit the number
> >> per node?
> >>
> >> That is, you have X mappers/reducers, but only can allow N mappers/
> >> reducers to run at a time globally, for a given job.
> >>
> >> Or, you are cool with all X running concurrently globally, but
> >> want to
> >> guarantee that no node can run more than N tasks from that job?
> >>
> >> Or both?
> >>
> >> just reconciling the conversation we had last week with this thread.
> >>
> >> ckw
> >>
> >> On Feb 3, 2009, at 11:16 AM, Jonathan Gray wrote:
> >>
> >>> All,
> >>>
> >>>
> >>>
> >>> I have a few relatively small clusters (5-20 nodes) and am having
> >>> trouble
> >>> keeping them loaded with my MR jobs.
> >>>
> >>>
> >>>
> >>> The primary issue is that I have different jobs that have
> >>> drastically
> >>> different patterns.  I have jobs that read/write to/from HBase or
> >>> Hadoop
> >>> with minimal logic (network throughput bound or io bound), others
> >> that
> >>> perform crawling (network latency bound), and one huge parsing
> >>> streaming job
> >>> (very CPU bound, each task eats a core).
> >>>
> >>>
> >>>
> >>> I'd like to launch very large numbers of tasks for network latency
> >>> bound
> >>> jobs, however the large CPU bound job means I have to keep the max
> >>> maps
> >>> allowed per node low enough as to not starve the Datanode and
> >>> Regionserver.
> >>>
> >>>
> >>>
> >>> I'm an HBase dev but not familiar enough with Hadoop MR code to
> even
> >>> know
> >>> what would be involved with implementing this.  However, in talking
> >>> with
> >>> other users, it seems like this would be a well-received option.
> >>>
> >>>
> >>>
> >>> I wanted to ping the list before filing an issue because it seems
> >> like
> >>> someone may have thought about this in the past.
> >>>
> >>>
> >>>
> >>> Thanks.
> >>>
> >>>
> >>>
> >>> Jonathan Gray
> >>>
> >>
> >> --
> >> Chris K Wensel
> >> ch...@wensel.net
> >> http://www.cascading.org/
> >> http://www.scaleunlimited.com/
> >

RE: Control over max map/reduce tasks per job

Reply via email to