Yuzhao Chen created STORM-2044:
----------------------------------

             Summary: Nimb
                 Key: STORM-2044
                 URL: https://issues.apache.org/jira/browse/STORM-2044
             Project: Apache Storm
          Issue Type: Improvement
          Components: storm-core
    Affects Versions: 1.0.2
         Environment: CentOS 6.5
            Reporter: Yuzhao Chen
             Fix For: 1.1.0


Now pacemaker is a stand-alone service and not HA. When is goes down, all the 
workers's heartbeats will be lost. It will task a long time to recover even if 
pacemaker goes up immediately if there are dozens GBs of heartbeats. During the 
time worker heartbeats are not restored completely, Nimbus will think these 
workers are died because of heartbeats timeout and reassign these "dead" 
workers continuously until heartbeats restore to normal. So, during recovery 
time, many topologies will be reassigned and the throughout will goes very 
down. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to