Github user markap14 commented on the issue:
https://github.com/apache/nifi/pull/2646
@mcgilman we do indeed implement the ConnectionStateListener, but we do so
only to log the fact and then call super.stateChanged(). When we call
super.stateChanged(), that will throw CancelLeadershipException, which in turn
is supposed to interrupt our listener. We followed the "Error Handling"
guidance provided by Apache Curator:
https://curator.apache.org/curator-recipes/leader-election.html
So we are handling the SUSPENDED and LOST scenarios as is recommended. And
this works 99% of the time. Unfortunately, we do occasionally see scenarios
where it does not interrupt the thread and as such the node believes that it
retains the lock. It's not clear, when this happens, if the thread just wasn't
interrupted for some reason, or if the notification of SUSPENDED/LOST never was
received, or what exactly is occurring that prevents our ElectionListener from
being interrupted.
That's why I went with the solution of periodically polling ZooKeeper, to
check the state. That way, whatever the cause of the thread not being
interrupted, we still will break out. If you think it makes sense, though, we
can detect the LOST state specifically and have that trigger us to leave the
election, in addition to polling?
---