[
https://issues.apache.org/jira/browse/STORM-256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Rick Kellogg updated STORM-256:
-------------------------------
Summary: storm rebalance bug caused supervisor miss topology’s assignment
(was: storm relanace bug caused supervisor miss topology’s assignment)
> storm rebalance bug caused supervisor miss topology’s assignment
> ----------------------------------------------------------------
>
> Key: STORM-256
> URL: https://issues.apache.org/jira/browse/STORM-256
> Project: Apache Storm
> Issue Type: Bug
> Reporter: vinceyang
>
> in our 300+ nodes cluster,when do rebalance low probability occurred
> supervisor miss topology‘s assignmet
> Process as Follows:
> nimbus rebalance:
> 1 . receive relanace command
> 2. nimbus chanage job status in zookeeper to "KILLED"
> 3. compute new assignment and write assignment to zookeeper
> 4. chanage job status to “ACTIVE”
> supervisor rebalance (supervisor watch topology assinment node in zookeeper ):
> 1. when topology's status change to “KILLED” ,supervisor receive chanage
> call mk-synchronize-supervisor function
> 2. in mk-synchronize-supervisor function try to read assignment from
> zookeeper ,For simplicity we name assigment-A,but before read out topology‘s
> status has change to “ACTIVE”,job’s assignment changed to assignment-B ,
> mk-synchronize-supervisor only read out assignment-B miss assignment-A
> 3. assignment-A missed, rebanace not become effective in this supervisor ,
> the whole topology not woring
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)