[
https://issues.apache.org/jira/browse/STORM-2083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Xin Wang updated STORM-2083:
----------------------------
Fix Version/s: (was: 1.0.3)
(was: 1.1.0)
(was: 1.0.2)
(was: 1.0.1)
> Blacklist Scheduler
> -------------------
>
> Key: STORM-2083
> URL: https://issues.apache.org/jira/browse/STORM-2083
> Project: Apache Storm
> Issue Type: New Feature
> Components: storm-core
> Reporter: Howard Lee
> Labels: blacklist, scheduling
> Time Spent: 1h 20m
> Remaining Estimate: 0h
>
> My company has gone through a fault in production, in which a critical switch
> causes unstable network for a set of machines with package loss rate of
> 30%-50%. In such fault, the supervisors and workers on the machines are not
> definitely dead, which is easy to handle. Instead they are still alive but
> very unstable. They lost heartbeat to the nimbus occasionally. The nimbus, in
> such circumstance, will still assign jobs to these machines, but will soon
> find them invalid again, result in a very slow convergence to stable status.
> To deal with such unstable cases, we intend to implement a blacklist
> scheduler, which will add the unstable nodes (supervisors, slots) to the
> blacklist temporarily, and resume them later.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)