[
https://issues.apache.org/jira/browse/FLINK-13633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16921301#comment-16921301
]
Yang Wang commented on FLINK-13633:
-----------------------------------
Hi [~azagrebin]
Sorry for that. Next time, i will wait for a conclusion in the Jira and ask for
the assignee before working on the PR.
I have moved the issue status to in progress.
> Move submittedJobGraph and completedCheckpoint to cluster-id subdirectory of
> high-availability storage
> -------------------------------------------------------------------------------------------------------
>
> Key: FLINK-13633
> URL: https://issues.apache.org/jira/browse/FLINK-13633
> Project: Flink
> Issue Type: New Feature
> Components: Runtime / Coordination
> Reporter: Yang Wang
> Assignee: Yang Wang
> Priority: Major
>
> Currently, if we enable the high-availability, the ha storage directory
> structure is stored as below. The submittedJobGraph and completedCheckpoint
> are directly stored under the ha storage path. It is reasonable when the
> flink cluster finished normally. However, when the Yarn application is failed
> or killed, the submittedJobGraph and completedCheckpoint will exist there
> forever. Even we could not know which flink cluster(Yarn application) they
> belongs to. So i suggest to move them into application subdirectory. Some
> external tools could be used to clean up these residual files.
> Also, we need to do best effort clean-up before the flink cluster finishes.
> Current ha storage directory structure
> {code:java}
> └── <high-availability.storageDir>
> ├── submittedJobGraph
> ├ ├ <jobgraph1>(random named)
> ├ ├ <jobgraph2>(random named)
> ├── completedCheckpoint
> ├ ├ <checkpoint1>(random named)
> ├ ├ <checkpoint2>(random named)
> ├ ├ <checkpoint3>(random named)
> ├── <high-availability.cluster-id>
> ├── blob
> ├── <blob1>(named as [no_job|job_<job-id>]/blob_<blob-key>)
> {code}
>
> The new ha storage directory structure
> {code:java}
> └── <high-availability.storageDir>
> ├── <high-availability.cluster-id>
> ├── submittedJobGraph
> ├ ├ <jobgraph1>(random named)
> ├ ├ <jobgraph2>(random named)
> ├── completedCheckpoint
> ├ ├ <checkpoint1>(random named)
> ├ ├ <checkpoint2>(random named)
> ├ ├ <checkpoint1>(random named)
> ├── blob
> ├── <blob1>(named as
> [no_job|job_<job-id>]/blob_<blob-key>) {code}
--
This message was sent by Atlassian Jira
(v8.3.2#803003)