[
https://issues.apache.org/jira/browse/STORM-1383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15051042#comment-15051042
]
ASF GitHub Bot commented on STORM-1383:
---------------------------------------
Github user ppoulosk commented on the pull request:
https://github.com/apache/storm/pull/938#issuecomment-163641863
+1
> Supervisors should not crash if nimbus is unavailable
> -----------------------------------------------------
>
> Key: STORM-1383
> URL: https://issues.apache.org/jira/browse/STORM-1383
> Project: Apache Storm
> Issue Type: Improvement
> Components: storm-core
> Affects Versions: 0.11.0
> Reporter: Derek Dagit
> Assignee: Derek Dagit
>
> In cases of maintenance or unexpected downtime of nimbus nodes, supervisors
> will crash in a loop. This can cause a lot of confusion among users
> (supervisors crash repeatedly) and admins (monitoring/alerting triggered for
> the entire cluster).
> Supervisors periodically check with nimbus to synchronize blob versions, and
> as part of this, a connection is made to the leader nimbus daemon. Formerly,
> supervisors did not periodically contact nimbus, and so nimbus downtime did
> not cascade to cluster-wide supervisor failures.
> It might be nice to handle the case when nimbus cannot be contacted, and
> continue in the normal loop.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)