[ 
https://issues.apache.org/jira/browse/STORM-909?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chuanlei Ni reassigned STORM-909:
---------------------------------

    Assignee: Chuanlei Ni

> Automatic Black Listing of bad nodes
> ------------------------------------
>
>                 Key: STORM-909
>                 URL: https://issues.apache.org/jira/browse/STORM-909
>             Project: Apache Storm
>          Issue Type: Improvement
>            Reporter: Robert Joseph Evans
>            Assignee: Chuanlei Ni
>
> We should be able to detect and monitor the failure rate of workers on nodes, 
> and come up with a few different probabilities.  How likely is it that this 
> worker will fail on this particular node in the next n mins.  How likely is 
> it that all workers will fail on this particular node in the next n mins.  
> How likely is it that this worker will fail on any node in the next n mins.
> With these we should be able to detect bad nodes and blacklist them, and 
> ideally trigger external systems that can take actions to try and fix the 
> nodes.  We should also be able to detect topologies that have bugs in the 
> common case warn them, and in the worst case stop trying to run them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to