shameersss1 commented on PR #7138:
URL: https://github.com/apache/hadoop/pull/7138#issuecomment-2531512162

   @brumi1024  - Thanks for looking into this.
   
   > what is the reason behind changing the default of this setting? 
   
   1. The current default scheduling mechanism is synchronous (node-heart 
driven) which is not efficient when there are large number of containers to be 
allocated.
   2. It also has additional issues like scheduling won't happen if there is 
node-heartbeat loss due to network issue .
   3. @wangdatan did an amazing job of making the async scheudling production 
ready : Refer 
https://issues.apache.org/jira/browse/YARN-7327?focusedCommentId=16205259&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16205259
 for benchmark details.
   4. The above benchmark shows async scheudling throughput is better than sync 
scheduling
   
   And hence the proposal here is to change the default scheduling stratergy 
for capacity scheduler from synchronous to asynchronous. Already companies like 
Alibaba cloud use this in their production 
https://www.alibabacloud.com/help/en/emr/emr-on-ecs/user-guide/yarn-schedulers
   
   @brumi1024  - Do you think is there any blocker/issue in enabling it by 
default ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to