[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13799382#comment-13799382
 ] 

Francis Liu commented on MAPREDUCE-5583:
----------------------------------------

{quote}
Cluster with 100,000 containers, 1,000 jobs, each with 100000 tasks, and 
specifies that they can only run 5 tasks. So, you are now only using 5% of the 
cluster and no one makes progress leading to very poor utilization and 
peanut-buttering effect.
{quote}
Given that YARN is supposed to engender a diverse set of AMs. This seems to be 
a problem that should be solved by the RM anyway? I'm not that familiar with 
the scheduler, but if we were to use queues to limit the number of tasks the 
outcome would be the same wouldn't it? Since we're bound by the upper-limit 
config of the max jobs? 

{quote}
Some form of admin control (e.g. queue with a max-cap) for a small number of 
use-cases where you actually need this feature is much safer.
{quote}
We have a number of use cases and it is growing. I'm hoping we can come up with 
a solution that does not require users to hack the MRv2 AM. This would not only 
be useful as a manual MR config. I can see this being useful as something an 
InputFormat/OutputFormat automatically sets or maybe even something that DSLs 
can leverage. Apart from queues some users control this by limiting the number 
of reducers or controlling the map task. The latter is done by merging split 
files which is undesirable as it would make a task failure costly. So it'd be 
great if we could have a clean way of doing this.



> Ability to limit running map and reduce tasks
> ---------------------------------------------
>
>                 Key: MAPREDUCE-5583
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5583
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: mr-am, mrv2
>    Affects Versions: 0.23.9, 2.1.1-beta
>            Reporter: Jason Lowe
>
> It would be nice if users could specify a limit to the number of map or 
> reduce tasks that are running simultaneously.  Occasionally users are 
> performing operations in tasks that can lead to DDoS scenarios if too many 
> tasks run simultaneously (e.g.: accessing a database, web service, etc.).  
> Having the ability to throttle the number of tasks simultaneously running 
> would provide users a way to mitigate issues with too many tasks on a large 
> cluster attempting to access a serivce at any one time.
> This is similar to the functionality requested by MAPREDUCE-224 and 
> implemented by HADOOP-3412 but was dropped in mrv2.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to