[ 
https://issues.apache.org/jira/browse/STORM-2044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuzhao Chen updated STORM-2044:
-------------------------------
    Summary: Nimbus should not make assignments crazily when Pacemaker goes 
down  (was: Nimbus should not make assignments crazy when Pacemaker down)

> Nimbus should not make assignments crazily when Pacemaker goes down
> -------------------------------------------------------------------
>
>                 Key: STORM-2044
>                 URL: https://issues.apache.org/jira/browse/STORM-2044
>             Project: Apache Storm
>          Issue Type: Improvement
>          Components: storm-core
>    Affects Versions: 1.0.2
>         Environment: CentOS 6.5
>            Reporter: Yuzhao Chen
>              Labels: patch
>             Fix For: 1.1.0
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> Now pacemaker is a stand-alone service and not HA. When is goes down, all the 
> workers's heartbeats will be lost. It will task a long time to recover even 
> if pacemaker goes up immediately if there are dozens GBs of heartbeats. 
> During the time worker heartbeats are not restored completely, Nimbus will 
> think these workers are died because of heartbeats timeout and reassign these 
> "dead" workers continuously until heartbeats restore to normal. So, during 
> recovery time, many topologies will be reassigned and the throughout will 
> goes very down. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to