GitHub user jiajunwang opened a pull request:
https://github.com/apache/helix/pull/297
ZkClient related improvments
We identify 2 potential issues that may cause a retrying ZK operation
failed unexpectedly. These commits fix the problem.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/jiajunwang/helix master
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/helix/pull/297.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #297
----
commit 4c4891197103bd3fe4660fdeca40b537de649b97
Author: Jiajun Wang <jjwang@...>
Date: 2019-01-08T23:27:21Z
Add ZkConnection.reconnect to avoid NPE when reset ZkConnection.
In the old version, reconnect was done by closing and then connecting. In
between, the zookeeper ref is null. This may cause NPE which terminate ZkClient
operation retry earlier than expected.
This change copy the existing ZkConnection and add reconnect. The new
method ensures reconnecting without leaving the field empty.
commit 879abf7b59c7b029e5a0dec21691b69a50722d27
Author: Jiajun Wang <jjwang@...>
Date: 2019-01-11T23:53:17Z
Improve the callback handler behavior regarding batch mode event handling
when handler is reset.
For new session handling, the callback handler should not interrupt the
current executing process. This could cause a pending request failed
unexpectedly. Note that after the change, closing a callback will still
interrupt thread to avoid thread leak.
----
---