Zhuo Liu created STORM-1696:
-------------------------------

             Summary: Backpressure flag not sync if zookeeper connection errors
                 Key: STORM-1696
                 URL: https://issues.apache.org/jira/browse/STORM-1696
             Project: Apache Storm
          Issue Type: Bug
    Affects Versions: 1.0.0, 2.0.0
            Reporter: Zhuo Liu
            Assignee: Zhuo Liu
            Priority: Blocker
             Fix For: 1.0.0, 2.0.0




When there is a zk exception happens during worker-backpressure!,
there is a bad state which can block the topology from running normally any 
more.

The root cause: worker/mk-backpressure-handler
if the worker-backpressure! fails once due to zk connection exception once,
next time when this method gets called by WordBackpressureThread, because (when 
(not= prev-backpressure-flag curr-backpressure-flag) will never be true, the 
remote zk node can not be synced with local state.

This also explains why we will not see any problem when testing in a stable (zk 
never fail) environment.

Solution is quite straightforward: first change the zk status, if succeeds, 
change local status.

This fixes the hidden bug and removes redundant flags in executor-data and 
worker-data (since we can get the executor status directly from the 
"_throttleOn" boolean in the DisruptorQueue)




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to