[jira] [Comment Edited] (FLINK-26522) Refactoring code for multiple component leader election

Matthias Pohl (Jira) Thu, 05 Jan 2023 01:52:06 -0800


    [ 
https://issues.apache.org/jira/browse/FLINK-26522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17654541#comment-17654541
 ]


Matthias Pohl edited comment on FLINK-26522 at 1/5/23 9:51 AM:
---------------------------------------------------------------

I had to rework the proposal and added version 2 of the diagram as an 
[attachement 
leaderelection-FLINK-26522.class.v2.svg|^leaderelection-FLINK-26522.class.v2.svg].
 When drafting the proposal, I ran into issues with the previous interfaces: 
The contender does not necessarily own the LeaderElectionService, which means 
that removing/unregistering the contender is not that easy because the owner of 
the LeaderElectionService (who usually calls stop()) would have to hold a 
reference to the LeaderContender to trigger the removal/unregistering of the 
instance from the LeaderElectionService.

The new proposal introduces a new interface LeaderElection. The v1 version 
didn't consider that the leadership check needs to be done per contender (i.e. 
{{LeaderElectionService.confirmLeadership(UUID, String)}} and 
{{LeaderElectionService.hasLeadership(UUID)}} are required to know the 
contender that shall be checked. We could extend the signatures of those 
methods to pass in the {{LeaderContender}} ((analogously to how it's already 
done for the  {{LeaderElectionService.remove(LeaderContender)}} method). 
Introducing the {{LeaderElection}} interface helps us coming up with a cleaner 
approach: The LeaderContender is only interested in confirming the leadership, 
checking for leadership and closing the session.


was (Author: mapohl):
I had to rework the proposal and added version 2 of the diagram as an 
[attachement 
leaderelection-FLINK-26522.class.v2.svg|^leaderelection-FLINK-26522.class.v2.svg].
 When drafting the proposal, I ran into issues with the previous interfaces: 
The contender does not necessarily own the LeaderElectionService, which means 
that removing/unregistering the contender is not that easy because the owner of 
the LeaderElectionService (who usually calls stop()) would have to hold a 
reference to the LeaderContender to trigger the removal/unregistering of the 
instance from the LeaderElectionService.

The new proposal introduces a new interface LeaderElection. The v1 version 
didn't consider that the leadership check needs to be done per contender (i.e. 
{{LeaderElectionService.confirmLeadership(UUID, String)}} and 
{{LeaderElectionService.hasLeadership(UUID)}} are required to know the 
contender that shall be checked. We could extend the signatures of those 
methods to pass in the {{LeaderContender}} ((analogously to how it's already 
done for the  {{LeaderElectionService.remove(LeaderContender)}} method). 
Introducing the {{LeaderElection}} interface helps us coming up with a cleaner 
approach: The LeaderContender is only interested in confirming the leadership, 
checking for leadership and closing the session.

> Refactoring code for multiple component leader election
> -------------------------------------------------------
>
>                 Key: FLINK-26522
>                 URL: https://issues.apache.org/jira/browse/FLINK-26522
>             Project: Flink
>          Issue Type: Improvement
>          Components: Runtime / Coordination
>    Affects Versions: 1.16.0
>            Reporter: Niklas Semmler
>            Assignee: Matthias Pohl
>            Priority: Major
>              Labels: pull-request-available
>         Attachments: leaderelection-FLINK-26522.class.svg, 
> leaderelection-FLINK-26522.class.v2.svg, 
> leaderelection-flink-1.15+.class.svg, leaderelection-flink-1.15-.class.svg
>
>
> The current implementation of the multiple component leader election faces a 
> number of issues. These issues mostly stem from an attempt to make the 
> multiple leader election process work just the same way as the single 
> component leader election.
> An attempt at listing the issues follows:
> * *Naming* MultipleComponentLeaderElectionService appears by name similar to 
> the LeaderElectionService, but is in fact closer to the LeaderElectionDriver.
> * *Similarity* The interfaces LeaderElectionService, LeaderElectionDriver and 
> MultipleComponentLeaderElectionDriver are very similar to each other.
> * *Cyclic dependency* DefaultMultipleComponentLeaderElectionService holds a 
> reference to the ZooKeeperMultipleComponentLeaderElectionDriver 
> (MultipleComponentLeaderElectionDriver), which in turn holds a reference to 
> the DefaultMultipleComponentLeaderElectionService (LeaderLatchListener)
> * *Unclear contract* With single component leader election drivers such as 
> ZooKeeperLeaderElectionDriver a call to the LeaderElectionService#stop from 
> JobMasterServiceLeadershipRunner#closeAsync implies giving up the leadership 
> of the JobMaster. With the multiple component leader election this is no 
> longer the case. The leadership is held until the HighAvailabilityServices 
> shutdown. This logic may be difficult to understand from the perspective of 
> one of the components (e.g., the Dispatcher)
> * *Long call hierarchy* 
> DefaultLeaderElectionService->MultipleComponentLeaderElectionDriverAdapter->MultipleComponentLeaderElectionService->ZooKeeperMultipleComponentLeaderElectionDriver
> * *Long prefix* "MultipleComponentLeaderElection" is quite a long prefix but 
> shared by many classes.
> * *Adapter as primary implementation* All non-testing non-multiple-component 
> leadership drivers are deprecated. The primary implementation of 
> LeaderElectionDriver is the adapter 
> MultipleComponentLeaderElectionDriverAdapter.
> * *Possible redundancy* We currently have similar methods for the Dispatcher, 
> ResourceManager, JobMaster and WebMonitorEndpoint. (E.g., for granting 
> leadership.) As these methods are called at the same time due to the multiple 
> component leader election, it may make sense to combine this logic into a 
> single object.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Comment Edited] (FLINK-26522) Refactoring code for multiple component leader election

Reply via email to