[ 
https://issues.apache.org/jira/browse/FLINK-14149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Rohrmann reassigned FLINK-14149:
-------------------------------------

    Assignee:     (was: Zili Chen)

> Introduce ZooKeeperLeaderElectionServiceNG
> ------------------------------------------
>
>                 Key: FLINK-14149
>                 URL: https://issues.apache.org/jira/browse/FLINK-14149
>             Project: Flink
>          Issue Type: Sub-task
>          Components: Runtime / Coordination
>            Reporter: Zili Chen
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> Subsequent to the discussion in FLINK-10333, we reach a consensus that 
> refactor ZK based storage with a transaction store mechanism. The overall 
> design can be found in the design document linked below.
> This subtask is aimed at introducing the prerequisite to adopt transaction 
> store, i.e., a new leader election service for ZK scenario. The necessity is 
> that we have to retrieve the corresponding latch path per contender following 
> the algorithm describe in FLINK-10333.
> Here is the (descriptive) details about the implementation.
> We adopt the optimized version of [this 
> recipe|https://zookeeper.apache.org/doc/current/recipes.html#sc_leaderElection][1].
>  Code details can be found in [this 
> branch|https://github.com/TisonKun/flink/tree/election-service] and the state 
> machine can be found in the design document attached. Here is only the most 
> important difference from the former implementation:
> *Leader election is an one-shot service.*
> Specifically, we only create one latch for a specific contender. We tolerate 
> {{SUSPENDED}} a.k.a. {{CONNECTIONLOSS}} so that the only situation we lost 
> leadership is session expired, which infers the ephemeral latch znode is 
> deleted. We don't re-participant as contender so after {{revokeLeadership}} a 
> contender will never be granted any more. This is not a problem but we can do 
> further refactor in contender side for better behavior.
> Another topic is about interface. Back to the big picture of FLINK-10333 we 
> eventually use a transaction store for persisting job graph and checkpoint 
> and so on. So there will be a {{getLeaderStore}} method added on 
> {{LeaderElectionServices}}. Because we don't use it at all it is an open 
> question that whether we add the method to the interface in this subtask. And 
> if so, whether we implement it for other election services implementation.
> {{concealLeaderInfo}} is another method appeared in the document that aimed 
> at clean up leader info node on stop. So the same problem as 
> {{getLeaderStore}}.
> **For what we gain**
> 1. Basics for the overall goal under FLINK-10333
> 2. Leader info node must be modified by the current leader. Thus we can 
> reduce a lot of concurrency handling logic in currently ZLES, including using 
> {{NodeCache}} as well as dealing with complex stat of ephemeral leader info 
> node.
> [1] For other implementation, I start [a 
> thread|https://lists.apache.org/x/thread.html/594b66ecb1d60b560a5c4c08ed1b2a67bc29143cb4e8d368da8c39b2@%3Cuser.zookeeper.apache.org%3E]
>  in ZK and Curator to discuss. Anyway, it will be implementation details 
> only, and interfaces and semantics should not be affected.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to