shameersss1 commented on PR #7138: URL: https://github.com/apache/hadoop/pull/7138#issuecomment-2531512162
@brumi1024 - Thanks for looking into this. > what is the reason behind changing the default of this setting? 1. The current default scheduling mechanism is synchronous (node-heart driven) which is not efficient when there are large number of containers to be allocated. 2. It also has additional issues like scheduling won't happen if there is node-heartbeat loss due to network issue . 3. @wangdatan did an amazing job of making the async scheudling production ready : Refer https://issues.apache.org/jira/browse/YARN-7327?focusedCommentId=16205259&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16205259 for benchmark details. 4. The above benchmark shows async scheudling throughput is better than sync scheduling And hence the proposal here is to change the default scheduling stratergy for capacity scheduler from synchronous to asynchronous. Already companies like Alibaba cloud use this in their production https://www.alibabacloud.com/help/en/emr/emr-on-ecs/user-guide/yarn-schedulers @brumi1024 - Do you think is there any blocker/issue in enabling it by default ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
