[jira] Commented: (HADOOP-5170) Set max map/reduce tasks on a per-job basis, either per-node or cluster-wide

Matei Zaharia (JIRA) Thu, 07 May 2009 12:11:11 -0700

    [ 
https://issues.apache.org/jira/browse/HADOOP-5170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12707048#action_12707048
 ]


Matei Zaharia commented on HADOOP-5170:
---------------------------------------

Hemanth, if you submit another job that is also CPU-bound, it may interfere 
with the first. However, if you submit one that is IO-bound, it will be fine. 
This task limit feature isn't meant to solve the general resource allocation 
problem, only to give you a way to limit resource consumption if you know that 
you have one job with very resource-intensive tasks and many jobs with less 
resource-intensive tasks. Because it's such a simple feature, I think it's a 
good one to add before building any kind of automatic resource-aware 
scheduling. It will solve many users' problems in the short term, as evidenced 
by the number of votes and watchers.

> Set max map/reduce tasks on a per-job basis, either per-node or cluster-wide
> ----------------------------------------------------------------------------
>
>                 Key: HADOOP-5170
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5170
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: mapred
>            Reporter: Jonathan Gray
>         Attachments: tasklimits.patch
>
>
> There are a number of use cases for being able to do this.  The focus of this 
> jira should be on finding what would be the simplest to implement that would 
> satisfy the most use cases.
> This could be implemented as either a per-node maximum or a cluster-wide 
> maximum.  It seems that for most uses, the former is preferable however 
> either would fulfill the requirements of this jira.
> Some of the reasons for allowing this feature (mine and from others on list):
> - I have some very large CPU-bound jobs.  I am forced to keep the max 
> map/node limit at 2 or 3 (on a 4 core node) so that I do not starve the 
> Datanode and Regionserver.  I have other jobs that are network latency bound 
> and would like to be able to run high numbers of them concurrently on each 
> node.  Though I can thread some jobs, there are some use cases that are 
> difficult to thread (scanning from hbase) and there's significant complexity 
> added to the job rather than letting hadoop handle the concurrency.
> - Poor assignment of tasks to nodes creates some situations where you have 
> multiple reducers on a single node but other nodes that received none.  A 
> limit of 1 reducer per node for that job would prevent that from happening. 
> (only works with per-node limit)
> - Poor mans MR job virtualization.  Since we can limit a jobs resources, this 
> gives much more control in allocating and dividing up resources of a large 
> cluster.  (makes most sense w/ cluster-wide limit)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-5170) Set max map/reduce tasks on a per-job basis, either per-node or cluster-wide

Reply via email to