[ 
https://issues.apache.org/jira/browse/FLINK-10333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16817441#comment-16817441
 ] 

TisonKun edited comment on FLINK-10333 at 4/14/19 8:30 PM:
-----------------------------------------------------------

[~till.rohrmann] I draft a document about using transactions to ensure that 
only the leader can modify znodes. Here is the link 
https://docs.google.com/document/d/1cBY1t0k5g1xNqzyfZby3LcPu4t-wpx57G1xf-nmWrCo/edit?usp=sharing

It is most about an interface we achieve leader election as well as interact 
with zookeeper. Since we couple access control with leadership it is reasonable.

For backward compatibility, as in the design document, we do not change the 
contend of znodes and thus I think it can be just switched. However, I'm also 
strongly in favor of add a new {{ZooKeeperNGHighAvailabilityServices}} 
implementation which let the whole implementation opt-in at first.

Additionally for backward compatibility, as we discuss before the contender of 
znodes could be re-layout to meet the requirement that providing namespace. 
Concretely, JobManager only modify datas under its corresponding job_id, and 
Dispatcher only modify datas under its corresponding cluster_id. Further, I 
would prefer add a "data/" entry for each xxx_id(like a shade) and let the 
election logic invisible when interact with zookeeper for persisting state. 
Anyway, it is an orthogonal topic.


was (Author: tison):
[~till.rohrmann] I draft a document about using transactions to ensure that 
only the leader can modify znodes. Here is the link 
https://docs.google.com/document/d/1cBY1t0k5g1xNqzyfZby3LcPu4t-wpx57G1xf-nmWrCo/edit?usp=sharing

It is most about an interface we achieve leader election as well as interact 
with zookeeper. Since we couple access control with leadership it is reasonable.

For backward compatibility, as the document design, we do not modify the 
contend of znodes and thus I think it can be just switched. However, I'm also 
strongly in favor of add a new implementation which let the whole change opt-in 
at first.

Additionally for backward compatibility, as we discuss before the contender of 
znodes could be re-layout to meet the requirement that providing namespace. 
Concretely, JobManager only modify datas under its corresponding job_id, and 
Dispatcher only modify datas under its corresponding cluster_id. Further, I 
would prefer add a "data/" entry for each xxx_id(like a shade) and let the 
election logic invisible when interact with zookeeper for persisting state. 
Anyway, it is an orthogonal topic.

> Rethink ZooKeeper based stores (SubmittedJobGraph, MesosWorker, 
> CompletedCheckpoints)
> -------------------------------------------------------------------------------------
>
>                 Key: FLINK-10333
>                 URL: https://issues.apache.org/jira/browse/FLINK-10333
>             Project: Flink
>          Issue Type: Bug
>          Components: Runtime / Coordination
>    Affects Versions: 1.5.3, 1.6.0, 1.7.0
>            Reporter: Till Rohrmann
>            Priority: Major
>
> While going over the ZooKeeper based stores 
> ({{ZooKeeperSubmittedJobGraphStore}}, {{ZooKeeperMesosWorkerStore}}, 
> {{ZooKeeperCompletedCheckpointStore}}) and the underlying 
> {{ZooKeeperStateHandleStore}} I noticed several inconsistencies which were 
> introduced with past incremental changes.
> * Depending whether {{ZooKeeperStateHandleStore#getAllSortedByNameAndLock}} 
> or {{ZooKeeperStateHandleStore#getAllAndLock}} is called, deserialization 
> problems will either lead to removing the Znode or not
> * {{ZooKeeperStateHandleStore}} leaves inconsistent state in case of 
> exceptions (e.g. {{#getAllAndLock}} won't release the acquired locks in case 
> of a failure)
> * {{ZooKeeperStateHandleStore}} has too many responsibilities. It would be 
> better to move {{RetrievableStateStorageHelper}} out of it for a better 
> separation of concerns
> * {{ZooKeeperSubmittedJobGraphStore}} overwrites a stored {{JobGraph}} even 
> if it is locked. This should not happen since it could leave another system 
> in an inconsistent state (imagine a changed {{JobGraph}} which restores from 
> an old checkpoint)
> * Redundant but also somewhat inconsistent put logic in the different stores
> * Shadowing of ZooKeeper specific exceptions in {{ZooKeeperStateHandleStore}} 
> which were expected to be caught in {{ZooKeeperSubmittedJobGraphStore}}
> * Getting rid of the {{SubmittedJobGraphListener}} would be helpful
> These problems made me think how reliable these components actually work. 
> Since these components are very important, I propose to refactor them.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to