[
https://issues.apache.org/jira/browse/STORM-2044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Yuzhao Chen updated STORM-2044:
-------------------------------
Summary: Nimbus should not make assignments crazy when Pacemaker down
(was: Nimb)
> Nimbus should not make assignments crazy when Pacemaker down
> ------------------------------------------------------------
>
> Key: STORM-2044
> URL: https://issues.apache.org/jira/browse/STORM-2044
> Project: Apache Storm
> Issue Type: Improvement
> Components: storm-core
> Affects Versions: 1.0.2
> Environment: CentOS 6.5
> Reporter: Yuzhao Chen
> Labels: patch
> Fix For: 1.1.0
>
> Original Estimate: 672h
> Remaining Estimate: 672h
>
> Now pacemaker is a stand-alone service and not HA. When is goes down, all the
> workers's heartbeats will be lost. It will task a long time to recover even
> if pacemaker goes up immediately if there are dozens GBs of heartbeats.
> During the time worker heartbeats are not restored completely, Nimbus will
> think these workers are died because of heartbeats timeout and reassign these
> "dead" workers continuously until heartbeats restore to normal. So, during
> recovery time, many topologies will be reassigned and the throughout will
> goes very down.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)