[
https://issues.apache.org/jira/browse/FLINK-13633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16921239#comment-16921239
]
Yang Wang commented on FLINK-13633:
-----------------------------------
[~azagrebin]
I have submitted a PR, please help to review.
Thanks.
> Move submittedJobGraph and completedCheckpoint to cluster-id subdirectory of
> high-availability storage
> -------------------------------------------------------------------------------------------------------
>
> Key: FLINK-13633
> URL: https://issues.apache.org/jira/browse/FLINK-13633
> Project: Flink
> Issue Type: New Feature
> Components: Runtime / Coordination
> Reporter: Yang Wang
> Priority: Major
>
> Currently, if we enable the high-availability, the ha storage directory
> structure is stored as below. The submittedJobGraph and completedCheckpoint
> are directly stored under the ha storage path. It is reasonable when the
> flink cluster finished normally. However, when the Yarn application is failed
> or killed, the submittedJobGraph and completedCheckpoint will exist there
> forever. Even we could not know which flink cluster(Yarn application) they
> belongs to. So i suggest to move them into application subdirectory. Some
> external tools could be used to clean up these residual files.
> Also, we need to do best effort clean-up before the flink cluster finishes.
> Current ha storage directory structure
> {code:java}
> └── <high-availability.storageDir>
> ├── submittedJobGraph
> ├ ├ <jobgraph1>(random named)
> ├ ├ <jobgraph2>(random named)
> ├── completedCheckpoint
> ├ ├ <checkpoint1>(random named)
> ├ ├ <checkpoint2>(random named)
> ├ ├ <checkpoint3>(random named)
> ├── <high-availability.cluster-id>
> ├── blob
> ├── <blob1>(named as [no_job|job_<job-id>]/blob_<blob-key>)
> {code}
>
> The new ha storage directory structure
> {code:java}
> └── <high-availability.storageDir>
> ├── <high-availability.cluster-id>
> ├── submittedJobGraph
> ├ ├ <jobgraph1>(random named)
> ├ ├ <jobgraph2>(random named)
> ├── completedCheckpoint
> ├ ├ <checkpoint1>(random named)
> ├ ├ <checkpoint2>(random named)
> ├ ├ <checkpoint1>(random named)
> ├── blob
> ├── <blob1>(named as
> [no_job|job_<job-id>]/blob_<blob-key>) {code}
--
This message was sent by Atlassian Jira
(v8.3.2#803003)