I have filed an issue for this: https://issues.apache.org/jira/browse/HADOOP-5170
JG > -----Original Message----- > From: Bryan Duxbury [mailto:br...@rapleaf.com] > Sent: Tuesday, February 03, 2009 10:59 PM > To: core-user@hadoop.apache.org > Subject: Re: Control over max map/reduce tasks per job > > This sounds good enough for a JIRA ticket to me. > -Bryan > > On Feb 3, 2009, at 11:44 AM, Jonathan Gray wrote: > > > Chris, > > > > For my specific use cases, it would be best to be able to set N > > mappers/reducers per job per node (so I can explicitly say, run at > > most 2 at > > a time of this CPU bound task on any given node). However, the > > other way > > would work as well (on 10 node system, would set job to max 20 > > tasks at a > > time globally), but opens up the possibility that a node could be > > assigned > > more than 2 of that task. > > > > I would work with whatever is easiest to implement as either would > > be a vast > > improvement for me (can run high numbers of network latency bound > > tasks > > without fear of cpu bound tasks killing the cluster). > > > > JG > > > > > > > >> -----Original Message----- > >> From: Chris K Wensel [mailto:ch...@wensel.net] > >> Sent: Tuesday, February 03, 2009 11:34 AM > >> To: core-user@hadoop.apache.org > >> Subject: Re: Control over max map/reduce tasks per job > >> > >> Hey Jonathan > >> > >> Are you looking to limit the total number of concurrent mapper/ > >> reducers a single job can consume cluster wide, or limit the number > >> per node? > >> > >> That is, you have X mappers/reducers, but only can allow N mappers/ > >> reducers to run at a time globally, for a given job. > >> > >> Or, you are cool with all X running concurrently globally, but > >> want to > >> guarantee that no node can run more than N tasks from that job? > >> > >> Or both? > >> > >> just reconciling the conversation we had last week with this thread. > >> > >> ckw > >> > >> On Feb 3, 2009, at 11:16 AM, Jonathan Gray wrote: > >> > >>> All, > >>> > >>> > >>> > >>> I have a few relatively small clusters (5-20 nodes) and am having > >>> trouble > >>> keeping them loaded with my MR jobs. > >>> > >>> > >>> > >>> The primary issue is that I have different jobs that have > >>> drastically > >>> different patterns. I have jobs that read/write to/from HBase or > >>> Hadoop > >>> with minimal logic (network throughput bound or io bound), others > >> that > >>> perform crawling (network latency bound), and one huge parsing > >>> streaming job > >>> (very CPU bound, each task eats a core). > >>> > >>> > >>> > >>> I'd like to launch very large numbers of tasks for network latency > >>> bound > >>> jobs, however the large CPU bound job means I have to keep the max > >>> maps > >>> allowed per node low enough as to not starve the Datanode and > >>> Regionserver. > >>> > >>> > >>> > >>> I'm an HBase dev but not familiar enough with Hadoop MR code to > even > >>> know > >>> what would be involved with implementing this. However, in talking > >>> with > >>> other users, it seems like this would be a well-received option. > >>> > >>> > >>> > >>> I wanted to ping the list before filing an issue because it seems > >> like > >>> someone may have thought about this in the past. > >>> > >>> > >>> > >>> Thanks. > >>> > >>> > >>> > >>> Jonathan Gray > >>> > >> > >> -- > >> Chris K Wensel > >> ch...@wensel.net > >> http://www.cascading.org/ > >> http://www.scaleunlimited.com/ > >