Yuzhao Chen created STORM-2043:
----------------------------------

             Summary: Nimbus should not make assignments crazy when Pacemaker 
down
                 Key: STORM-2043
                 URL: https://issues.apache.org/jira/browse/STORM-2043
             Project: Apache Storm
          Issue Type: Improvement
          Components: storm-core
    Affects Versions: 1.0.0, 1.0.1, 1.0.2, 1.1.0
         Environment: CentOS 6.5 
            Reporter: Yuzhao Chen
             Fix For: 1.0.2


When pacemaker goes down, all the heartbeats of workers are lost. These 
heartbeats will need a long time to recover even if pacemaker goes up 
immediately if it costs dozens of GB memory. During the time worker heartbeats 
are not complete,Nimbus will think the workers are died( heartbeat time out ),  
and reassign these workers crazily. But actually the workers are healthy, the 
reassignment will move in cycles until pacemaker heartbeats recover. During 
this time, all the topologies's throughout will goes down. We should avoid 
this, because Pacemaker has no HA.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to