[
https://issues.apache.org/jira/browse/FLINK-14149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Till Rohrmann reassigned FLINK-14149:
-------------------------------------
Assignee: (was: Zili Chen)
> Introduce ZooKeeperLeaderElectionServiceNG
> ------------------------------------------
>
> Key: FLINK-14149
> URL: https://issues.apache.org/jira/browse/FLINK-14149
> Project: Flink
> Issue Type: Sub-task
> Components: Runtime / Coordination
> Reporter: Zili Chen
> Priority: Major
> Labels: pull-request-available
> Time Spent: 10m
> Remaining Estimate: 0h
>
> Subsequent to the discussion in FLINK-10333, we reach a consensus that
> refactor ZK based storage with a transaction store mechanism. The overall
> design can be found in the design document linked below.
> This subtask is aimed at introducing the prerequisite to adopt transaction
> store, i.e., a new leader election service for ZK scenario. The necessity is
> that we have to retrieve the corresponding latch path per contender following
> the algorithm describe in FLINK-10333.
> Here is the (descriptive) details about the implementation.
> We adopt the optimized version of [this
> recipe|https://zookeeper.apache.org/doc/current/recipes.html#sc_leaderElection][1].
> Code details can be found in [this
> branch|https://github.com/TisonKun/flink/tree/election-service] and the state
> machine can be found in the design document attached. Here is only the most
> important difference from the former implementation:
> *Leader election is an one-shot service.*
> Specifically, we only create one latch for a specific contender. We tolerate
> {{SUSPENDED}} a.k.a. {{CONNECTIONLOSS}} so that the only situation we lost
> leadership is session expired, which infers the ephemeral latch znode is
> deleted. We don't re-participant as contender so after {{revokeLeadership}} a
> contender will never be granted any more. This is not a problem but we can do
> further refactor in contender side for better behavior.
> Another topic is about interface. Back to the big picture of FLINK-10333 we
> eventually use a transaction store for persisting job graph and checkpoint
> and so on. So there will be a {{getLeaderStore}} method added on
> {{LeaderElectionServices}}. Because we don't use it at all it is an open
> question that whether we add the method to the interface in this subtask. And
> if so, whether we implement it for other election services implementation.
> {{concealLeaderInfo}} is another method appeared in the document that aimed
> at clean up leader info node on stop. So the same problem as
> {{getLeaderStore}}.
> **For what we gain**
> 1. Basics for the overall goal under FLINK-10333
> 2. Leader info node must be modified by the current leader. Thus we can
> reduce a lot of concurrency handling logic in currently ZLES, including using
> {{NodeCache}} as well as dealing with complex stat of ephemeral leader info
> node.
> [1] For other implementation, I start [a
> thread|https://lists.apache.org/x/thread.html/594b66ecb1d60b560a5c4c08ed1b2a67bc29143cb4e8d368da8c39b2@%3Cuser.zookeeper.apache.org%3E]
> in ZK and Curator to discuss. Anyway, it will be implementation details
> only, and interfaces and semantics should not be affected.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)