[
https://issues.apache.org/jira/browse/STORM-1696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15231311#comment-15231311
]
ASF GitHub Bot commented on STORM-1696:
---------------------------------------
GitHub user zhuoliu opened a pull request:
https://github.com/apache/storm/pull/1320
[STORM-1696]-1.x-branch status not sync if zk fails in backpressure
This issue can cause a topology to be blocked.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/zhuoliu/storm STORM-1696-1.x-branch
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/storm/pull/1320.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #1320
----
commit 9271056b22ab5c734157a9ca1f3f4ab9a28d4b4b
Author: zhuol <[email protected]>
Date: 2016-04-07T23:12:33Z
[STORM-1696]-1.x-branch status not sync if zk fails in backpressure
----
> Backpressure flag not sync if zookeeper connection errors
> ---------------------------------------------------------
>
> Key: STORM-1696
> URL: https://issues.apache.org/jira/browse/STORM-1696
> Project: Apache Storm
> Issue Type: Bug
> Affects Versions: 1.0.0, 2.0.0
> Reporter: Zhuo Liu
> Assignee: Zhuo Liu
> Priority: Blocker
> Fix For: 1.0.0, 2.0.0
>
>
> When there is a zk exception happens during worker-backpressure!,
> there is a bad state which can block the topology from running normally any
> more.
> The root cause: in worker/mk-backpressure-handler
> if the worker-backpressure! fails once due to zk connection exception,
> next time when this method gets called by WordBackpressureThread, because
> (when (not= prev-backpressure-flag curr-backpressure-flag) will never be
> true, the remote zk node can not be synced with local state.
> This also explains why we will not see any problem when testing in a stable
> (zk never fail) environment.
> Solution is quite straightforward: first change the zk status, if succeeds,
> change local status.
> This fixes the hidden bug and removes redundant flags in executor-data and
> worker-data (since we can get the executor status directly from the
> "_throttleOn" boolean in the DisruptorQueue)
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)