vinceyang created STORM-256:
-------------------------------
Summary: storm relanace bug caused supervisor miss topology’s
assignment
Key: STORM-256
URL: https://issues.apache.org/jira/browse/STORM-256
Project: Apache Storm (Incubating)
Issue Type: Bug
Reporter: vinceyang
in our 300+ nodes cluster,when do rebalance low probability occurred supervisor
miss topology‘s assignmet
Process as Follows:
nimbus rebalance:
1 . receive relanace command
2. nimbus chanage job status in zookeeper to "KILLED"
3. compute new assignment and write assignment to zookeeper
4. chanage job status to “ACTIVE”
supervisor rebalance (supervisor watch topology assinment node in zookeeper ):
1. when topology's status change to “KILLED” ,supervisor receive chanage call
mk-synchronize-supervisor function
2. in mk-synchronize-supervisor function try to read assignment from
zookeeper ,For simplicity we name assigment-A,but before read out topology‘s
status has change to “ACTIVE”,job’s assignment changed to assignment-B ,
mk-synchronize-supervisor only read out assignment-B miss assignment-A
3. assignment-A missed, rebanace not become effective in this supervisor , the
whole topology not woring
--
This message was sent by Atlassian JIRA
(v6.2#6252)