[jira] [Commented] (FLINK-26522) Refactoring code for multiple component leader election

Matthias Pohl (Jira) Fri, 23 Dec 2022 02:18:04 -0800


    [ 
https://issues.apache.org/jira/browse/FLINK-26522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17651580#comment-17651580
 ]


Matthias Pohl commented on FLINK-26522:
---------------------------------------

I added class diagrams for the per-component leader election in Flink 1.15- and 
the per-process leader election based on the 1.15- interfaces which got 
introduced in Flink 1.15+.

{{MultiComponentLeaderElectionService}} maps multiple components to a single 
leader election driver. In contrast, the {{LeaderElectionService}} interface 
only supports a single contender per leader election driver. The 
{{MultipleComponentLeaderElectionDriverAdapter}} serves as a translation point 
between the two interfaces. With the 1.15- implementation being removed in 
FLINK-25806, we now can merge the interface that's offered by 
{{MultipleComponentLeaderElectionService}} into {{LeaderElectionService}}. 
Essentially, {{LeaderElectionService}} should provide methods that allow 
multiple contenders to be assigned to a single {{LeaderElectionService}}. 

An initial idea to even make the current implementation (one leader election 
process for all components) being reflected in the {{HighAvailabilityServices}} 
was rejected by the community in [this discussion 
thread|https://lists.apache.org/thread/9oy2ml9s3j1v6r77h31sndhc3gw57cfm]. We 
still want to have the flexibility to allow dedicated leader election for 
individual components. The proposal mentioned in the previous paragraph will 
provide us this flexibility.

This change requires different paradigm on who owns the leader election driver, 
i.e. who's in charge of the lifecycle management/starting/stopping the 
connection to the HA backend. In the 1.15- approach, 
{{HighAvailabilityServices}} provided factory methods for creating the 
{{LeaderElectionService}} the instance was then managed (i.e. started and 
stopped) by the {{LeaderContender}}. The per-process leader election 
implementation (1.15+) changes this paradigm. The leader election driver is 
held by the {{MultiComponentLeaderElectionService}}. There's only one instance 
{{MultiComponentLeaderElectionService}} that is managed (started/stopped) by 
the {{HighAvailabilityServices}} implementations. Therefore, the connection is 
not stopped when each individual component shuts down but the 
{{HighAvailabilityServices}} are cleaned up through its 
{{closeAndCleanupAllData}}.

There's an alternative approach to the lifecycle management being done in 
{{HighAvailabilityServices}}: We could instantiate the leader election driver 
as soon as the first component registers with the {{LeaderElectionService}}. 
Analogously, we would close the leader election driver as soon as the last 
component deregisters from the {{LeaderElectionService}}. This would be closer 
to what was in place in 1.15- and is different to what's implemented in 1.15+ 
with the {{MultiComponentLeaderElectionService}}.

> Refactoring code for multiple component leader election
> -------------------------------------------------------
>
>                 Key: FLINK-26522
>                 URL: https://issues.apache.org/jira/browse/FLINK-26522
>             Project: Flink
>          Issue Type: Improvement
>          Components: Runtime / Coordination
>    Affects Versions: 1.16.0
>            Reporter: Niklas Semmler
>            Priority: Major
>         Attachments: leaderelection-flink-1.15+.class.svg, 
> leaderelection-flink-1.15-.class.svg
>
>
> The current implementation of the multiple component leader election faces a 
> number of issues. These issues mostly stem from an attempt to make the 
> multiple leader election process work just the same way as the single 
> component leader election.
> An attempt at listing the issues follows:
> * *Naming* MultipleComponentLeaderElectionService appears by name similar to 
> the LeaderElectionService, but is in fact closer to the LeaderElectionDriver.
> * *Similarity* The interfaces LeaderElectionService, LeaderElectionDriver and 
> MultipleComponentLeaderElectionDriver are very similar to each other.
> * *Cyclic dependency* DefaultMultipleComponentLeaderElectionService holds a 
> reference to the ZooKeeperMultipleComponentLeaderElectionDriver 
> (MultipleComponentLeaderElectionDriver), which in turn holds a reference to 
> the DefaultMultipleComponentLeaderElectionService (LeaderLatchListener)
> * *Unclear contract* With single component leader election drivers such as 
> ZooKeeperLeaderElectionDriver a call to the LeaderElectionService#stop from 
> JobMasterServiceLeadershipRunner#closeAsync implies giving up the leadership 
> of the JobMaster. With the multiple component leader election this is no 
> longer the case. The leadership is held until the HighAvailabilityServices 
> shutdown. This logic may be difficult to understand from the perspective of 
> one of the components (e.g., the Dispatcher)
> * *Long call hierarchy* 
> DefaultLeaderElectionService->MultipleComponentLeaderElectionDriverAdapter->MultipleComponentLeaderElectionService->ZooKeeperMultipleComponentLeaderElectionDriver
> * *Long prefix* "MultipleComponentLeaderElection" is quite a long prefix but 
> shared by many classes.
> * *Adapter as primary implementation* All non-testing non-multiple-component 
> leadership drivers are deprecated. The primary implementation of 
> LeaderElectionDriver is the adapter 
> MultipleComponentLeaderElectionDriverAdapter.
> * *Possible redundancy* We currently have similar methods for the Dispatcher, 
> ResourceManager, JobMaster and WebMonitorEndpoint. (E.g., for granting 
> leadership.) As these methods are called at the same time due to the multiple 
> component leader election, it may make sense to combine this logic into a 
> single object.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (FLINK-26522) Refactoring code for multiple component leader election

Reply via email to