[
https://issues.apache.org/jira/browse/STORM-37?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Rick Kellogg updated STORM-37:
------------------------------
Component/s: storm-core
> Auto-deactivate topologies that are continuously erroring
> ---------------------------------------------------------
>
> Key: STORM-37
> URL: https://issues.apache.org/jira/browse/STORM-37
> Project: Apache Storm
> Issue Type: Improvement
> Components: storm-core
> Reporter: James Xu
> Priority: Minor
>
> There exists a bad interaction between the isolation scheduler, how Mesos
> does resource offers (in storm-mesos), and continuously erroring topologies.
> The effect is that no non-isolated topologies can run because the isolation
> scheduler needs to kill non-isolated topologies to free up resources for
> isolated topologies in the next scheduling iteration, and continuously does
> so because the isolated topology always errors.
> A nice fix for this would be for Nimbus to automatically deactivate
> topologies that are continuously erroring. It should measure the number of X
> worker failures in the last Y minutes and put the topology into
> "DEACTIVATED_ERRORED" state if there's too many errors.
> This would also be good for non-Mesos clusters in order to avoid the cost of
> continuous JVM startups from erroring topologies.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)