[
https://issues.apache.org/jira/browse/FLINK-27848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matthias Pohl reopened FLINK-27848:
-----------------------------------
Assignee: Matthias Pohl (was: Weijie Guo)
I'm reopening this issue to provide forward(?)ports for 1.16 and 1.17.
Refactoring the leader election for FLIP-285/FLINK-26522 is kind of tricky. I'm
trying to slice the code changes into meaningful commits (and ideally dedicated
PRs) to make the review process easier.
I ran into this issue when refactoring the code and merging classes into one
which also required adapting tests. This revealed the inconsistency/bug in the
ZooKeeperLeaderElectionDriver implementation. Merging the bugfixes into 1.17
and 1.16 makes the other changes more reasonable/consistent.
> ZooKeeperLeaderElectionDriver keeps writing leader information, using up zxid
> -----------------------------------------------------------------------------
>
> Key: FLINK-27848
> URL: https://issues.apache.org/jira/browse/FLINK-27848
> Project: Flink
> Issue Type: Bug
> Components: Runtime / Coordination
> Affects Versions: 1.15.0
> Reporter: Xintong Song
> Assignee: Matthias Pohl
> Priority: Major
> Labels: pull-request-available
> Fix For: 1.15.1
>
>
> After a leadership change, the new leader may keeps writing its information
> (which is identical) to ZK, causing the zxid on ZK quickly used up.
> The problem is that, in
> {{ZooKeeperLeaderElectionDriver#retrieveLeaderInformationFromZooKeeper}},
> {{leaderElectionEventHandler.onLeaderInformationChange(LeaderInformation.empty())}}
> is called no matter {{childData}} is {{null}} or not. In case of non-null,
> this will cause the driver keeps re-writing the leader information to ZK.
> The problem was introduced in FLINK-24038, and only affects the legacy
> {{ZooKeeperHaServices}}. Thus, only 1.15 are affected.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)