[
https://issues.apache.org/jira/browse/FLINK-26522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17654541#comment-17654541
]
Matthias Pohl edited comment on FLINK-26522 at 1/5/23 9:51 AM:
---------------------------------------------------------------
I had to rework the proposal and added version 2 of the diagram as an
[attachement
leaderelection-FLINK-26522.class.v2.svg|^leaderelection-FLINK-26522.class.v2.svg].
When drafting the proposal, I ran into issues with the previous interfaces:
The contender does not necessarily own the LeaderElectionService, which means
that removing/unregistering the contender is not that easy because the owner of
the LeaderElectionService (who usually calls stop()) would have to hold a
reference to the LeaderContender to trigger the removal/unregistering of the
instance from the LeaderElectionService.
The new proposal introduces a new interface LeaderElection. The v1 version
didn't consider that the leadership check needs to be done per contender (i.e.
{{LeaderElectionService.confirmLeadership(UUID, String)}} and
{{LeaderElectionService.hasLeadership(UUID)}} are required to know the
contender that shall be checked. We could extend the signatures of those
methods to pass in the {{LeaderContender}} ((analogously to how it's already
done for the {{LeaderElectionService.remove(LeaderContender)}} method).
Introducing the {{LeaderElection}} interface helps us coming up with a cleaner
approach: The LeaderContender is only interested in confirming the leadership,
checking for leadership and closing the session.
was (Author: mapohl):
I had to rework the proposal and added version 2 of the diagram as an
[attachement
leaderelection-FLINK-26522.class.v2.svg|^leaderelection-FLINK-26522.class.v2.svg].
When drafting the proposal, I ran into issues with the previous interfaces:
The contender does not necessarily own the LeaderElectionService, which means
that removing/unregistering the contender is not that easy because the owner of
the LeaderElectionService (who usually calls stop()) would have to hold a
reference to the LeaderContender to trigger the removal/unregistering of the
instance from the LeaderElectionService.
The new proposal introduces a new interface LeaderElection. The v1 version
didn't consider that the leadership check needs to be done per contender (i.e.
{{LeaderElectionService.confirmLeadership(UUID, String)}} and
{{LeaderElectionService.hasLeadership(UUID)}} are required to know the
contender that shall be checked. We could extend the signatures of those
methods to pass in the {{LeaderContender}} ((analogously to how it's already
done for the {{LeaderElectionService.remove(LeaderContender)}} method).
Introducing the {{LeaderElection}} interface helps us coming up with a cleaner
approach: The LeaderContender is only interested in confirming the leadership,
checking for leadership and closing the session.
> Refactoring code for multiple component leader election
> -------------------------------------------------------
>
> Key: FLINK-26522
> URL: https://issues.apache.org/jira/browse/FLINK-26522
> Project: Flink
> Issue Type: Improvement
> Components: Runtime / Coordination
> Affects Versions: 1.16.0
> Reporter: Niklas Semmler
> Assignee: Matthias Pohl
> Priority: Major
> Labels: pull-request-available
> Attachments: leaderelection-FLINK-26522.class.svg,
> leaderelection-FLINK-26522.class.v2.svg,
> leaderelection-flink-1.15+.class.svg, leaderelection-flink-1.15-.class.svg
>
>
> The current implementation of the multiple component leader election faces a
> number of issues. These issues mostly stem from an attempt to make the
> multiple leader election process work just the same way as the single
> component leader election.
> An attempt at listing the issues follows:
> * *Naming* MultipleComponentLeaderElectionService appears by name similar to
> the LeaderElectionService, but is in fact closer to the LeaderElectionDriver.
> * *Similarity* The interfaces LeaderElectionService, LeaderElectionDriver and
> MultipleComponentLeaderElectionDriver are very similar to each other.
> * *Cyclic dependency* DefaultMultipleComponentLeaderElectionService holds a
> reference to the ZooKeeperMultipleComponentLeaderElectionDriver
> (MultipleComponentLeaderElectionDriver), which in turn holds a reference to
> the DefaultMultipleComponentLeaderElectionService (LeaderLatchListener)
> * *Unclear contract* With single component leader election drivers such as
> ZooKeeperLeaderElectionDriver a call to the LeaderElectionService#stop from
> JobMasterServiceLeadershipRunner#closeAsync implies giving up the leadership
> of the JobMaster. With the multiple component leader election this is no
> longer the case. The leadership is held until the HighAvailabilityServices
> shutdown. This logic may be difficult to understand from the perspective of
> one of the components (e.g., the Dispatcher)
> * *Long call hierarchy*
> DefaultLeaderElectionService->MultipleComponentLeaderElectionDriverAdapter->MultipleComponentLeaderElectionService->ZooKeeperMultipleComponentLeaderElectionDriver
> * *Long prefix* "MultipleComponentLeaderElection" is quite a long prefix but
> shared by many classes.
> * *Adapter as primary implementation* All non-testing non-multiple-component
> leadership drivers are deprecated. The primary implementation of
> LeaderElectionDriver is the adapter
> MultipleComponentLeaderElectionDriverAdapter.
> * *Possible redundancy* We currently have similar methods for the Dispatcher,
> ResourceManager, JobMaster and WebMonitorEndpoint. (E.g., for granting
> leadership.) As these methods are called at the same time due to the multiple
> component leader election, it may make sense to combine this logic into a
> single object.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)