[jira] [Comment Edited] (FLINK-17808) Rename checkpoint meta file to "_metadata" until it has completed writing

xiaogang zhou (Jira) Tue, 30 Mar 2021 20:40:07 -0700


    [ 
https://issues.apache.org/jira/browse/FLINK-17808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17312029#comment-17312029
 ]


xiaogang zhou edited comment on FLINK-17808 at 3/31/21, 3:39 AM:
-----------------------------------------------------------------

[~yunta] As Steven mentioned above, between two solutions, the better one is to 
use a RecoverableWriter.

 

We can refer to the streamFileWriter's approach, use the existing 
RecoverableFsDataOutputStream.

We don't really use the recover function, only need the commit function. And 
the closeForCommit  function is enough for our case.

And for each implement:

1, file: ok

2, hdfs: ok

3, S3: ok

4, oss: I think this a Ali cloud FS, I think we can implement a recoverable 
writer for it.

 


was (Author: zhoujira86):
[~yunta] As Steven mentioned above, between two solutions, the better one is to 
use a RecoverableWriter.

 

We can refer to the streamFileWriter's approach, use the existing 
RecoverableFsDataOutputStream.

We don't really use the recover function, only need the commit function. And 
the closeForCommit  function is enough for our case.

And for each implement:

1, file: ok

2, hdfs: ok

3, S3: ok

4, oss: I am not sure about the fs, we can discuss whether we should implement 
a Recoverable stream for it.

 

> Rename checkpoint meta file to "_metadata" until it has completed writing
> -------------------------------------------------------------------------
>
>                 Key: FLINK-17808
>                 URL: https://issues.apache.org/jira/browse/FLINK-17808
>             Project: Flink
>          Issue Type: Improvement
>          Components: Runtime / Checkpointing
>    Affects Versions: 1.10.0
>            Reporter: Yun Tang
>            Priority: Major
>             Fix For: 1.14.0
>
>
> In practice, some developers or customers would use some strategy to find the 
> recent _metadata as the checkpoint to recover (e.g as many proposals in 
> FLINK-9043 suggest). However, there existed a "_meatadata" file does not mean 
> the checkpoint have been completed as the writing to create the "_meatadata" 
> file could break as some force quit (e.g. yarn application -kill).
> We could create the checkpoint meta stream to write data to file named as 
> "_metadata.inprogress" and renamed it to "_metadata" once completed writing. 
> By doing so, we could ensure the "_metadata" is not broken.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Comment Edited] (FLINK-17808) Rename checkpoint meta file to "_metadata" until it has completed writing

Reply via email to