[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Lowe updated MAPREDUCE-5583:
----------------------------------
    Attachment: MAPREDUCE-5583v1.patch

Had an offline discussion about this with Arun, and he suggested using the ANY 
ask (i.e.: host="*") to act as a limit to the request.  YARN only schedules 
containers for an application as long as the ANY ask is non-zero, so sending a 
request for 100 hosts and 10 racks but an ANY ask of 1 will only return 1 
container.  If the AM carefully modulates the ANY ask then it can self-limit 
without needing to give up telling the RM about all of its locality desires.

Attaching a patch that implements this approach.  It needs unit tests, but I've 
manually tested it and maps and reduces are being limited, accordingly.  The 
mapreduce.job.running.maps.limit and mapreduce.job.running.reduces.limit 
properties control it, where 0 (the default) means no limit otherwise it 
specifies the number of maps or reduces, respectively, that will be allowed to 
run concurrently.

Feedback appreciated.

> Ability to limit running map and reduce tasks
> ---------------------------------------------
>
>                 Key: MAPREDUCE-5583
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5583
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: mr-am, mrv2
>    Affects Versions: 0.23.9, 2.1.1-beta
>            Reporter: Jason Lowe
>         Attachments: MAPREDUCE-5583v1.patch
>
>
> It would be nice if users could specify a limit to the number of map or 
> reduce tasks that are running simultaneously.  Occasionally users are 
> performing operations in tasks that can lead to DDoS scenarios if too many 
> tasks run simultaneously (e.g.: accessing a database, web service, etc.).  
> Having the ability to throttle the number of tasks simultaneously running 
> would provide users a way to mitigate issues with too many tasks on a large 
> cluster attempting to access a serivce at any one time.
> This is similar to the functionality requested by MAPREDUCE-224 and 
> implemented by HADOOP-3412 but was dropped in mrv2.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to