Yuzhao Chen created STORM-2044:
----------------------------------
Summary: Nimb
Key: STORM-2044
URL: https://issues.apache.org/jira/browse/STORM-2044
Project: Apache Storm
Issue Type: Improvement
Components: storm-core
Affects Versions: 1.0.2
Environment: CentOS 6.5
Reporter: Yuzhao Chen
Fix For: 1.1.0
Now pacemaker is a stand-alone service and not HA. When is goes down, all the
workers's heartbeats will be lost. It will task a long time to recover even if
pacemaker goes up immediately if there are dozens GBs of heartbeats. During the
time worker heartbeats are not restored completely, Nimbus will think these
workers are died because of heartbeats timeout and reassign these "dead"
workers continuously until heartbeats restore to normal. So, during recovery
time, many topologies will be reassigned and the throughout will goes very
down.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)