[ 
https://issues.apache.org/jira/browse/HADOOP-5170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matei Zaharia updated HADOOP-5170:
----------------------------------

    Attachment: tasklimits.patch

Here is a start at a patch for this issue. I added limits on running maps and 
reduces in the form of four parameters:
* mapred.max.maps.per.cluster
* mapred.max.reduces.per.cluster
* mapred.max.maps.per.node
* mapred.max.reduces.per.node

All the limits start at infinity by default (meaning no limit other than the 
number of slots on the node, as happens today).

These limits are located in JobInProgress and affect whether obtainNewMapTask 
and obtainNewReduceTask succeed. They will therefore work with any job 
scheduler (default FIFO scheduler, fair scheduler or capacity scheduler). For 
example, setting the per-cluster limit for a job under the FIFO scheduler will 
mean that this job will consume a certain number of slots (even if it has more 
tasks than this number of slots), and the other slots can be used by later jobs 
in the queue.

Let me know whether this approach looks good and whether the names for the 
parameters make sense. I can then maybe move the parameter strings into JobConf 
methods so they don't appear right in JobInProgress.

> Set max map/reduce tasks on a per-job basis, either per-node or cluster-wide
> ----------------------------------------------------------------------------
>
>                 Key: HADOOP-5170
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5170
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: mapred
>            Reporter: Jonathan Gray
>         Attachments: tasklimits.patch
>
>
> There are a number of use cases for being able to do this.  The focus of this 
> jira should be on finding what would be the simplest to implement that would 
> satisfy the most use cases.
> This could be implemented as either a per-node maximum or a cluster-wide 
> maximum.  It seems that for most uses, the former is preferable however 
> either would fulfill the requirements of this jira.
> Some of the reasons for allowing this feature (mine and from others on list):
> - I have some very large CPU-bound jobs.  I am forced to keep the max 
> map/node limit at 2 or 3 (on a 4 core node) so that I do not starve the 
> Datanode and Regionserver.  I have other jobs that are network latency bound 
> and would like to be able to run high numbers of them concurrently on each 
> node.  Though I can thread some jobs, there are some use cases that are 
> difficult to thread (scanning from hbase) and there's significant complexity 
> added to the job rather than letting hadoop handle the concurrency.
> - Poor assignment of tasks to nodes creates some situations where you have 
> multiple reducers on a single node but other nodes that received none.  A 
> limit of 1 reducer per node for that job would prevent that from happening. 
> (only works with per-node limit)
> - Poor mans MR job virtualization.  Since we can limit a jobs resources, this 
> gives much more control in allocating and dividing up resources of a large 
> cluster.  (makes most sense w/ cluster-wide limit)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to