[
https://issues.apache.org/jira/browse/FLINK-10333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16952687#comment-16952687
]
Zili Chen commented on FLINK-10333:
-----------------------------------
Quick updates on this thread for better collaboration with other related
efforts like FLINK-11843 & FLINK-11719. Also I'd like to know you guys'
schedule on this thread so that we can schedule common time pieces on it.
1. FLINK-14149 starts the first step of this thread that replace current
{{ZKLeaderElectionService}} with a new implementation allows us to perform
transactional operation checking leadership. PR #9878 is a prototype for what
the final state looks like and gives an image of {{LeaderStore}}. However, the
real code diff to be checked in is only new generation of
{{ZKLeaderElectionService}} and the replacement work. This work is totally
backward compatible.
2. As a second step {{LeaderStore}} interface is introduced with its
standalone, embedded and ZK implementation, the former two are both in memory
but have subtle differences on state checking. This work doesn't affect any
other code so it is totally backward compatible.
3. As a third step {{JobGraphStore}} & {{JobRegistry}} will be replaced with
leader store based implementation. We can unify ZK and standalone
implementation as described in the document. More details will go into the
subtasks but overall it also affect Dispatcher managements about
JobMaster(JobManagerRunner) which is code/possibly-logic conflict with
FLINK-11843. Besides, there is a plan to ensure totally backward compatible for
custom HA users.
4. As a forth step {{CheckpointStore}} & {{CheckpointIDCounter}} will be
replaced with leader store based implementation. Statements is quite similar as
3. Specifically we should collaborate with FLINK-11719 for a proper design of
leader election interval and endpoint lifecycle management. Especially we
cannot access {{CheckpointStore}} without leadership.
3 & 4 highly possibly contains several subtasks of themselves(logical steps
that result in a whole achievement).
x. Besides, with an investigation I suspect {{MesosWorkerStore}} doesn't work
because the persisted state is possible stale and we have to always access to
Mesos scheduler for a correct state. Given we don't have any dependency of
{{MesosWorkerStore}} in this series of efforts we can defer the change but I'm
going to give it a dedicate pass to see whether my statement stands and if so,
remove it.
> Rethink ZooKeeper based stores (SubmittedJobGraph, MesosWorker,
> CompletedCheckpoints)
> -------------------------------------------------------------------------------------
>
> Key: FLINK-10333
> URL: https://issues.apache.org/jira/browse/FLINK-10333
> Project: Flink
> Issue Type: Bug
> Components: Runtime / Coordination
> Affects Versions: 1.5.3, 1.6.0, 1.7.0
> Reporter: Till Rohrmann
> Priority: Major
> Attachments: screenshot-1.png
>
>
> While going over the ZooKeeper based stores
> ({{ZooKeeperSubmittedJobGraphStore}}, {{ZooKeeperMesosWorkerStore}},
> {{ZooKeeperCompletedCheckpointStore}}) and the underlying
> {{ZooKeeperStateHandleStore}} I noticed several inconsistencies which were
> introduced with past incremental changes.
> * Depending whether {{ZooKeeperStateHandleStore#getAllSortedByNameAndLock}}
> or {{ZooKeeperStateHandleStore#getAllAndLock}} is called, deserialization
> problems will either lead to removing the Znode or not
> * {{ZooKeeperStateHandleStore}} leaves inconsistent state in case of
> exceptions (e.g. {{#getAllAndLock}} won't release the acquired locks in case
> of a failure)
> * {{ZooKeeperStateHandleStore}} has too many responsibilities. It would be
> better to move {{RetrievableStateStorageHelper}} out of it for a better
> separation of concerns
> * {{ZooKeeperSubmittedJobGraphStore}} overwrites a stored {{JobGraph}} even
> if it is locked. This should not happen since it could leave another system
> in an inconsistent state (imagine a changed {{JobGraph}} which restores from
> an old checkpoint)
> * Redundant but also somewhat inconsistent put logic in the different stores
> * Shadowing of ZooKeeper specific exceptions in {{ZooKeeperStateHandleStore}}
> which were expected to be caught in {{ZooKeeperSubmittedJobGraphStore}}
> * Getting rid of the {{SubmittedJobGraphListener}} would be helpful
> These problems made me think how reliable these components actually work.
> Since these components are very important, I propose to refactor them.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)