[ https://issues.apache.org/jira/browse/SLIDER-1199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15856345#comment-15856345 ]
Billie Rinaldi commented on SLIDER-1199: ---------------------------------------- [~gsaha], thanks for the review. I was thinking that this block should be synchronized with [SliderAppMaster#executeNodeReview|https://github.com/apache/incubator-slider/blob/develop/slider-core/src/main/java/org/apache/slider/server/appmaster/SliderAppMaster.java#L1948-L1966]. That method calls appState.reviewRequestAndReleaseNodes (which is calling appState.updateBlacklist), and then executes all of the operations returned, which may include a blacklist operation. I think "creating a blacklist operation and executing it" should be synchronized. Blacklist is a YARN concept. updateBlacklist, blacklistAdditions, and blacklistRemovals are taken directly from the YARN API. I think we should continue to use the same terminology that YARN does. > Blacklist nodes that exceed the node failure threshold for a role > ----------------------------------------------------------------- > > Key: SLIDER-1199 > URL: https://issues.apache.org/jira/browse/SLIDER-1199 > Project: Slider > Issue Type: Bug > Components: appmaster > Reporter: Billie Rinaldi > Assignee: Billie Rinaldi > Fix For: Slider 1.0.0 > > Attachments: SLIDER-1199.1.patch, SLIDER-1199.2.patch, > SLIDER-1199.3.patch, SLIDER-1199.4.patch > > > From the code, it seems like when the node failure threshold for a role is > exceeded, that node is no longer suggested for placement. But there is > nothing preventing the RM from selecting the node again. If the node were > blacklisted, perhaps that would prevent new allocations on problem nodes. -- This message was sent by Atlassian JIRA (v6.3.15#6346)