Chris, For my specific use cases, it would be best to be able to set N mappers/reducers per job per node (so I can explicitly say, run at most 2 at a time of this CPU bound task on any given node). However, the other way would work as well (on 10 node system, would set job to max 20 tasks at a time globally), but opens up the possibility that a node could be assigned more than 2 of that task.
I would work with whatever is easiest to implement as either would be a vast improvement for me (can run high numbers of network latency bound tasks without fear of cpu bound tasks killing the cluster). JG > -----Original Message----- > From: Chris K Wensel [mailto:ch...@wensel.net] > Sent: Tuesday, February 03, 2009 11:34 AM > To: core-user@hadoop.apache.org > Subject: Re: Control over max map/reduce tasks per job > > Hey Jonathan > > Are you looking to limit the total number of concurrent mapper/ > reducers a single job can consume cluster wide, or limit the number > per node? > > That is, you have X mappers/reducers, but only can allow N mappers/ > reducers to run at a time globally, for a given job. > > Or, you are cool with all X running concurrently globally, but want to > guarantee that no node can run more than N tasks from that job? > > Or both? > > just reconciling the conversation we had last week with this thread. > > ckw > > On Feb 3, 2009, at 11:16 AM, Jonathan Gray wrote: > > > All, > > > > > > > > I have a few relatively small clusters (5-20 nodes) and am having > > trouble > > keeping them loaded with my MR jobs. > > > > > > > > The primary issue is that I have different jobs that have drastically > > different patterns. I have jobs that read/write to/from HBase or > > Hadoop > > with minimal logic (network throughput bound or io bound), others > that > > perform crawling (network latency bound), and one huge parsing > > streaming job > > (very CPU bound, each task eats a core). > > > > > > > > I'd like to launch very large numbers of tasks for network latency > > bound > > jobs, however the large CPU bound job means I have to keep the max > > maps > > allowed per node low enough as to not starve the Datanode and > > Regionserver. > > > > > > > > I'm an HBase dev but not familiar enough with Hadoop MR code to even > > know > > what would be involved with implementing this. However, in talking > > with > > other users, it seems like this would be a well-received option. > > > > > > > > I wanted to ping the list before filing an issue because it seems > like > > someone may have thought about this in the past. > > > > > > > > Thanks. > > > > > > > > Jonathan Gray > > > > -- > Chris K Wensel > ch...@wensel.net > http://www.cascading.org/ > http://www.scaleunlimited.com/