[ 
https://issues.apache.org/jira/browse/STORM-37?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rick Kellogg updated STORM-37:
------------------------------
    Component/s: storm-core

> Auto-deactivate topologies that are continuously erroring
> ---------------------------------------------------------
>
>                 Key: STORM-37
>                 URL: https://issues.apache.org/jira/browse/STORM-37
>             Project: Apache Storm
>          Issue Type: Improvement
>          Components: storm-core
>            Reporter: James Xu
>            Priority: Minor
>
> There exists a bad interaction between the isolation scheduler, how Mesos 
> does resource offers (in storm-mesos), and continuously erroring topologies. 
> The effect is that no non-isolated topologies can run because the isolation 
> scheduler needs to kill non-isolated topologies to free up resources for 
> isolated topologies in the next scheduling iteration, and continuously does 
> so because the isolated topology always errors.
> A nice fix for this would be for Nimbus to automatically deactivate 
> topologies that are continuously erroring. It should measure the number of X 
> worker failures in the last Y minutes and put the topology into 
> "DEACTIVATED_ERRORED" state if there's too many errors.
> This would also be good for non-Mesos clusters in order to avoid the cost of 
> continuous JVM startups from erroring topologies.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to