[jira] [Comment Edited] (FLINK-10333) Rethink ZooKeeper based stores (SubmittedJobGraph, MesosWorker, CompletedCheckpoints)

Zili Chen (Jira) Fri, 18 Oct 2019 06:34:43 -0700


    [ 
https://issues.apache.org/jira/browse/FLINK-10333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16954577#comment-16954577
 ]


Zili Chen edited comment on FLINK-10333 at 10/18/19 1:33 PM:
-------------------------------------------------------------

Hello here. Collected the thoughts above all I notice that for the consensus of 
leader store, specifically, leader transaction store[1][2], we actually turn to 
a different abstraction of currently high-availability services.

Currently, we get job graph store & checkpoint store and so on by the 
high-availability services itself, which means it is only relative to the 
implementation(standalone/zookeeper), but the underneath are totally blackbox. 
For a leader store based high-availability services, we store any metadata in a 
leader store so that ensure write operation performed with leadership. We 
conceptually don't have a job graph store or checkpoint store, just using the 
leader store for that same functionality[3].

Thus, I'm afraid that the proposal above is beyond a new implementation of 
high-availability but a refactoring to current high-availability abstraction.

Metadata storage is a general abstraction among multiple projects[4] that we 
can reason the behavior with proper design. Before discussing about this 
refactoring abstraction topic I'd like to align our consensus here to be sure 
about the meaning and impact what we turn to leader store based metadata 
storage means. Please share your concerns for a mutual understanding.

[1] 
https://lists.apache.org/x/thread.html/0839a4fb972ffdf65c8f301b94509bfbba2c3ed41c4ad32c9d3e87d2@%3Cuser.curator.apache.org%3E
[2] https://gist.github.com/Randgalt/1a19dcd215e202936e5b92c121fc73de

[1] & [2] are the discussion I raised in zookeeper & curator community for a 
leader store. We reached a consensus on the implementation for coordination. 
Leader store is itself a storage, we use it for storing job graph so the 
implementation different on leader store, not in job graph store. For the 
limited storage of zk, I have a general abstraction of extern storage and a 
trivial implementation for non-external, but details are deferred from this 
comment.

[3] We can keep job graph store interface and so on for lightweight testing 
hook and per job "submission" tweak.

[4] https://github.com/apache/pulsar/wiki/PIP-45%3A-Pluggable-metadata-interface

Metadata store is a general abstraction. 


was (Author: tison):
Hello here. Collected the thoughts above all I notice that for the consensus of 
leader store, specifically, leader transaction store[1][2], we actually turn to 
a different abstraction of currently high-availability services.

Currently, we get job graph store & checkpoint store and so on by the 
high-availability services itself, which means it is only relative to the 
implementation(standalone/zookeeper), but the underneath are totally blackbox. 
For a leader store based high-availability services, we store any metadata in a 
leader store so that ensure write operation performed with leadership. We 
conceptually don't have a job graph store or checkpoint store, just using the 
leader store for that same functionality.

Thus, I'm afraid that the proposal above is beyond a new implementation of 
high-availability but a refactoring to current high-availability abstraction.

Metadata storage is a general abstraction among multiple projects[3] that we 
can reason the behavior with proper design. Before discussing about this 
refactoring abstraction topic I'd like to align our consensus here to be sure 
about the meaning and impact what we turn to leader store based metadata 
storage means. Please share your concerns for a mutual understanding.

[1] 
https://lists.apache.org/x/thread.html/0839a4fb972ffdf65c8f301b94509bfbba2c3ed41c4ad32c9d3e87d2@%3Cuser.curator.apache.org%3E
[2] https://gist.github.com/Randgalt/1a19dcd215e202936e5b92c121fc73de

[1] & [2] are the discussion I raised in zookeeper & curator community for a 
leader store. We reached a consensus on the implementation for coordination. 
Leader store is itself a storage, we use it for storing job graph so the 
implementation different on leader store, not in job graph store. For the 
limited storage of zk, I have a general abstraction of extern storage and a 
trivial implementation for non-external, but details are deferred from this 
comment.

[3] https://github.com/apache/pulsar/wiki/PIP-45%3A-Pluggable-metadata-interface

Metadata store is a general abstraction. 

> Rethink ZooKeeper based stores (SubmittedJobGraph, MesosWorker, 
> CompletedCheckpoints)
> -------------------------------------------------------------------------------------
>
>                 Key: FLINK-10333
>                 URL: https://issues.apache.org/jira/browse/FLINK-10333
>             Project: Flink
>          Issue Type: Bug
>          Components: Runtime / Coordination
>    Affects Versions: 1.5.3, 1.6.0, 1.7.0
>            Reporter: Till Rohrmann
>            Priority: Major
>         Attachments: screenshot-1.png
>
>
> While going over the ZooKeeper based stores 
> ({{ZooKeeperSubmittedJobGraphStore}}, {{ZooKeeperMesosWorkerStore}}, 
> {{ZooKeeperCompletedCheckpointStore}}) and the underlying 
> {{ZooKeeperStateHandleStore}} I noticed several inconsistencies which were 
> introduced with past incremental changes.
> * Depending whether {{ZooKeeperStateHandleStore#getAllSortedByNameAndLock}} 
> or {{ZooKeeperStateHandleStore#getAllAndLock}} is called, deserialization 
> problems will either lead to removing the Znode or not
> * {{ZooKeeperStateHandleStore}} leaves inconsistent state in case of 
> exceptions (e.g. {{#getAllAndLock}} won't release the acquired locks in case 
> of a failure)
> * {{ZooKeeperStateHandleStore}} has too many responsibilities. It would be 
> better to move {{RetrievableStateStorageHelper}} out of it for a better 
> separation of concerns
> * {{ZooKeeperSubmittedJobGraphStore}} overwrites a stored {{JobGraph}} even 
> if it is locked. This should not happen since it could leave another system 
> in an inconsistent state (imagine a changed {{JobGraph}} which restores from 
> an old checkpoint)
> * Redundant but also somewhat inconsistent put logic in the different stores
> * Shadowing of ZooKeeper specific exceptions in {{ZooKeeperStateHandleStore}} 
> which were expected to be caught in {{ZooKeeperSubmittedJobGraphStore}}
> * Getting rid of the {{SubmittedJobGraphListener}} would be helpful
> These problems made me think how reliable these components actually work. 
> Since these components are very important, I propose to refactor them.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Comment Edited] (FLINK-10333) Rethink ZooKeeper based stores (SubmittedJobGraph, MesosWorker, CompletedCheckpoints)

Reply via email to