[jira] [Resolved] (FLINK-13633) Move submittedJobGraph and completedCheckpoint to cluster-id subdirectory of high-availability storage

Till Rohrmann (Jira) Tue, 17 Sep 2019 05:47:18 -0700


     [ 
https://issues.apache.org/jira/browse/FLINK-13633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Till Rohrmann resolved FLINK-13633.
-----------------------------------
    Fix Version/s: 1.10.0
     Release Note: All highly available artifacts stored by Apache Flink will 
now be stored under `HA_STORAGE_DIR/HA_CLUSTER_ID` with `HA_STORAGE_DIR` 
configured by `high-availability.storageDir` and `HA_CLUSTER_DI` configured by 
`high-availability.cluster-id`.
       Resolution: Done

Done via

8393c9670246c28adc4a254d3d486c8a9857a182
96563401b9924cd8800360bdbce93230b921e1ac

> Move submittedJobGraph and completedCheckpoint to cluster-id subdirectory of  
> high-availability storage
> -------------------------------------------------------------------------------------------------------
>
>                 Key: FLINK-13633
>                 URL: https://issues.apache.org/jira/browse/FLINK-13633
>             Project: Flink
>          Issue Type: Improvement
>          Components: Runtime / Coordination
>            Reporter: Yang Wang
>            Assignee: Yang Wang
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 1.10.0
>
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently, if we enable the high-availability, the ha storage directory 
> structure is stored as below. The submittedJobGraph and completedCheckpoint 
> are directly stored under the ha storage path. It is reasonable when the 
> flink cluster finished normally. However, when the Yarn application is failed 
> or killed, the submittedJobGraph and completedCheckpoint will exist there 
> forever. Even we could not know which flink cluster(Yarn application) they 
> belongs to. So i suggest to move them into application subdirectory. Some 
> external tools could be used to clean up these residual files.
> Also, we need to do best effort clean-up before the flink cluster finishes. 
> Current ha storage directory structure
> {code:java}
> └── <high-availability.storageDir>
>     ├── submittedJobGraph
>     ├                  ├ <jobgraph1>(random named)
>     ├                  ├ <jobgraph2>(random named)
>     ├── completedCheckpoint
>     ├              ├ <checkpoint1>(random named)
>     ├              ├ <checkpoint2>(random named)
>     ├              ├ <checkpoint3>(random named)
>     ├── <high-availability.cluster-id>
>            ├── blob
>                   ├── <blob1>(named as [no_job|job_<job-id>]/blob_<blob-key>)
> {code}
>  
> The new ha storage directory structure
> {code:java}
> └── <high-availability.storageDir>
>     ├── <high-availability.cluster-id>
>               ├── submittedJobGraph
>               ├                  ├ <jobgraph1>(random named)
>               ├                  ├ <jobgraph2>(random named)
>               ├── completedCheckpoint
>               ├               ├ <checkpoint1>(random named)
>               ├               ├ <checkpoint2>(random named)
>               ├               ├ <checkpoint1>(random named)
>               ├── blob
>                      ├── <blob1>(named as 
> [no_job|job_<job-id>]/blob_<blob-key>) {code}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

[jira] [Resolved] (FLINK-13633) Move submittedJobGraph and completedCheckpoint to cluster-id subdirectory of high-availability storage

Reply via email to