[
https://issues.apache.org/jira/browse/STORM-909?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Chuanlei Ni reassigned STORM-909:
---------------------------------
Assignee: Chuanlei Ni
> Automatic Black Listing of bad nodes
> ------------------------------------
>
> Key: STORM-909
> URL: https://issues.apache.org/jira/browse/STORM-909
> Project: Apache Storm
> Issue Type: Improvement
> Reporter: Robert Joseph Evans
> Assignee: Chuanlei Ni
>
> We should be able to detect and monitor the failure rate of workers on nodes,
> and come up with a few different probabilities. How likely is it that this
> worker will fail on this particular node in the next n mins. How likely is
> it that all workers will fail on this particular node in the next n mins.
> How likely is it that this worker will fail on any node in the next n mins.
> With these we should be able to detect bad nodes and blacklist them, and
> ideally trigger external systems that can take actions to try and fix the
> nodes. We should also be able to detect topologies that have bugs in the
> common case warn them, and in the worst case stop trying to run them.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)