[
https://issues.apache.org/jira/browse/STORM-1383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15062486#comment-15062486
]
ASF GitHub Bot commented on STORM-1383:
---------------------------------------
Github user d2r commented on the pull request:
https://github.com/apache/storm/pull/938#issuecomment-165537482
storm-core jdk7 ok
storm-core jdk8 failed, crash in
[messaging-test](https://travis-ci.org/apache/storm/jobs/97494930#L1315), looks
the backpressure code wrapped an InterruptedException in a RuntimeException,
will look into that.
!storm-core: no unexpected failures in either jdk7 or jdk8
> Supervisors should not crash if nimbus is unavailable
> -----------------------------------------------------
>
> Key: STORM-1383
> URL: https://issues.apache.org/jira/browse/STORM-1383
> Project: Apache Storm
> Issue Type: Improvement
> Components: storm-core
> Affects Versions: 0.11.0
> Reporter: Derek Dagit
> Assignee: Derek Dagit
>
> In cases of maintenance or unexpected downtime of nimbus nodes, supervisors
> will crash in a loop. This can cause a lot of confusion among users
> (supervisors crash repeatedly) and admins (monitoring/alerting triggered for
> the entire cluster).
> Supervisors periodically check with nimbus to synchronize blob versions, and
> as part of this, a connection is made to the leader nimbus daemon. Formerly,
> supervisors did not periodically contact nimbus, and so nimbus downtime did
> not cascade to cluster-wide supervisor failures.
> It might be nice to handle the case when nimbus cannot be contacted, and
> continue in the normal loop.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)