[jira] [Commented] (FLINK-10333) Rethink ZooKeeper based stores (SubmittedJobGraph, MesosWorker, CompletedCheckpoints)

Zili Chen (Jira) Wed, 16 Oct 2019 03:08:34 -0700


    [ 
https://issues.apache.org/jira/browse/FLINK-10333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16952687#comment-16952687
 ]


Zili Chen commented on FLINK-10333:
-----------------------------------

Quick updates on this thread for better collaboration with other related 
efforts like FLINK-11843 & FLINK-11719. Also I'd like to know you guys' 
schedule on this thread so that we can schedule common time pieces on it.

1. FLINK-14149 starts the first step of this thread that replace current 
{{ZKLeaderElectionService}} with a new implementation allows us to perform 
transactional operation checking leadership. PR #9878 is a prototype for what 
the final state looks like and gives an image of {{LeaderStore}}. However, the 
real code diff to be checked in is only new generation of 
{{ZKLeaderElectionService}} and the replacement work. This work is totally 
backward compatible.

2. As a second step {{LeaderStore}} interface is introduced with its 
standalone, embedded and ZK implementation, the former two are both in memory 
but have subtle differences on state checking. This work doesn't affect any 
other code so it is totally backward compatible.

3. As a third step {{JobGraphStore}} & {{JobRegistry}} will be replaced with 
leader store based implementation. We can unify ZK and standalone 
implementation as described in the document. More details will go into the 
subtasks but overall it also affect Dispatcher managements about 
JobMaster(JobManagerRunner) which is code/possibly-logic conflict with 
FLINK-11843. Besides, there is a plan to ensure totally backward compatible for 
custom HA users.

4. As a forth step {{CheckpointStore}} & {{CheckpointIDCounter}} will be 
replaced with leader store based implementation. Statements is quite similar as 
3. Specifically we should collaborate with FLINK-11719 for a proper design of 
leader election interval and endpoint lifecycle management. Especially we 
cannot access {{CheckpointStore}} without leadership.

3 & 4 highly possibly contains several subtasks of themselves(logical steps 
that result in a whole achievement).

x. Besides, with an investigation I suspect {{MesosWorkerStore}} doesn't work 
because the persisted state is possible stale and we have to always access to 
Mesos scheduler for a correct state. Given we don't have any dependency of 
{{MesosWorkerStore}} in this series of efforts we can defer the change but I'm 
going to give it a dedicate pass to see whether my statement stands and if so, 
remove it.

> Rethink ZooKeeper based stores (SubmittedJobGraph, MesosWorker, 
> CompletedCheckpoints)
> -------------------------------------------------------------------------------------
>
>                 Key: FLINK-10333
>                 URL: https://issues.apache.org/jira/browse/FLINK-10333
>             Project: Flink
>          Issue Type: Bug
>          Components: Runtime / Coordination
>    Affects Versions: 1.5.3, 1.6.0, 1.7.0
>            Reporter: Till Rohrmann
>            Priority: Major
>         Attachments: screenshot-1.png
>
>
> While going over the ZooKeeper based stores 
> ({{ZooKeeperSubmittedJobGraphStore}}, {{ZooKeeperMesosWorkerStore}}, 
> {{ZooKeeperCompletedCheckpointStore}}) and the underlying 
> {{ZooKeeperStateHandleStore}} I noticed several inconsistencies which were 
> introduced with past incremental changes.
> * Depending whether {{ZooKeeperStateHandleStore#getAllSortedByNameAndLock}} 
> or {{ZooKeeperStateHandleStore#getAllAndLock}} is called, deserialization 
> problems will either lead to removing the Znode or not
> * {{ZooKeeperStateHandleStore}} leaves inconsistent state in case of 
> exceptions (e.g. {{#getAllAndLock}} won't release the acquired locks in case 
> of a failure)
> * {{ZooKeeperStateHandleStore}} has too many responsibilities. It would be 
> better to move {{RetrievableStateStorageHelper}} out of it for a better 
> separation of concerns
> * {{ZooKeeperSubmittedJobGraphStore}} overwrites a stored {{JobGraph}} even 
> if it is locked. This should not happen since it could leave another system 
> in an inconsistent state (imagine a changed {{JobGraph}} which restores from 
> an old checkpoint)
> * Redundant but also somewhat inconsistent put logic in the different stores
> * Shadowing of ZooKeeper specific exceptions in {{ZooKeeperStateHandleStore}} 
> which were expected to be caught in {{ZooKeeperSubmittedJobGraphStore}}
> * Getting rid of the {{SubmittedJobGraphListener}} would be helpful
> These problems made me think how reliable these components actually work. 
> Since these components are very important, I propose to refactor them.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (FLINK-10333) Rethink ZooKeeper based stores (SubmittedJobGraph, MesosWorker, CompletedCheckpoints)

Reply via email to