Github user nilday commented on the issue: https://github.com/apache/storm/pull/1674 Thanks for all the advises you give. For the suggestions given by @revans2 : 1) We concerned implement blacklist in nimbus before. As a newbie in storm contribution and clojure, I choose to implement it as a scheduler so I can write the code in Java and has the minimum affect to the storm core so we can control the risk. The BlacklistScheduler now uses the DefaultScheduler underlying, and we can easily edit so code to let it support configuration to any scheduler. I would like to have a try to add the blacklist to nimbus, as I can't wait someone else implement it for us. 2)Showing blacklist on UI is good idea. 3)We have the same worry as you do. In the PR I submit this time, we have some code dealing with it. If the cluster have too many blacklist leading to lack of slots, the *DefaultBlacklistStrategy* will use *releaseBlacklistWhenNeeded* method to temporarily resume some supervisors from blacklist so we can try to assign some job to it. It's not good enough but at least it's a try. This is definitely a problem, I think there must be a config which can switch on or off the blacklist feature before it's finally stable enougth. @knusbaum talked about some heuristic algorithm, which we also had in our mind before. We think we may use the number of bad slots on one machine and the number of topoplogies they belong to to calculate the healthiness of a machine. The idea is not matrue enough so we haven't implement it, and we can write another IBlacklistStrategy to do that. As I am a newbie, there may be a lot barriers in front of me. I will be grateful if I can have your assistance when I face them. Thanks a lot.
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---