[jira] [Commented] (FLINK-17808) Rename checkpoint meta file to "_metadata" until it has completed writing

Stephan Ewen (Jira) Wed, 20 May 2020 15:54:09 -0700


    [ 
https://issues.apache.org/jira/browse/FLINK-17808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17112647#comment-17112647
 ]


Stephan Ewen commented on FLINK-17808:
--------------------------------------

We need to avoid renaming in checkpoints, because it causes 
visibility/consistency issues on some file systems.

We can instead do the following:
  - Use the RecoverableWriter (we don't need the recoverability, but we can use 
its committing feature)
  - Write a "latest checkpoint" file in the checkpoints root which points to 
the latest completed checkpoint

Option two would also be a simple way to implement a generic "resume latest" 
feature for the CLI.
It would not reliably work on all filesystems (for example not reliably on S3), 
but that would not be as bad as having inconsistent visibility of the 
checkpoint metadata file, which is used by ZK and externalized-checkpoint-based 
recovery.

> Rename checkpoint meta file to "_metadata" until it has completed writing
> -------------------------------------------------------------------------
>
>                 Key: FLINK-17808
>                 URL: https://issues.apache.org/jira/browse/FLINK-17808
>             Project: Flink
>          Issue Type: Improvement
>          Components: Runtime / Checkpointing
>    Affects Versions: 1.10.0
>            Reporter: Yun Tang
>            Assignee: Yun Tang
>            Priority: Major
>             Fix For: 1.12.0
>
>
> In practice, some developers or customers would use some strategy to find the 
> recent _metadata as the checkpoint to recover (e.g as many proposals in 
> FLINK-9043 suggest). However, there existed a "_meatadata" file does not mean 
> the checkpoint have been completed as the writing to create the "_meatadata" 
> file could break as some force quit (e.g. yarn application -kill).
> We could create the checkpoint meta stream to write data to file named as 
> "_metadata.inprogress" and renamed it to "_metadata" once completed writing. 
> By doing so, we could ensure the "_metadata" is not broken.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (FLINK-17808) Rename checkpoint meta file to "_metadata" until it has completed writing

Reply via email to