[
https://issues.apache.org/jira/browse/FLINK-17808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17312029#comment-17312029
]
xiaogang zhou edited comment on FLINK-17808 at 3/31/21, 3:39 AM:
-----------------------------------------------------------------
[~yunta] As Steven mentioned above, between two solutions, the better one is to
use a RecoverableWriter.
We can refer to the streamFileWriter's approach, use the existing
RecoverableFsDataOutputStream.
We don't really use the recover function, only need the commit function. And
the closeForCommit function is enough for our case.
And for each implement:
1, file: ok
2, hdfs: ok
3, S3: ok
4, oss: I think this a Ali cloud FS, I think we can implement a recoverable
writer for it.
was (Author: zhoujira86):
[~yunta] As Steven mentioned above, between two solutions, the better one is to
use a RecoverableWriter.
We can refer to the streamFileWriter's approach, use the existing
RecoverableFsDataOutputStream.
We don't really use the recover function, only need the commit function. And
the closeForCommit function is enough for our case.
And for each implement:
1, file: ok
2, hdfs: ok
3, S3: ok
4, oss: I am not sure about the fs, we can discuss whether we should implement
a Recoverable stream for it.
> Rename checkpoint meta file to "_metadata" until it has completed writing
> -------------------------------------------------------------------------
>
> Key: FLINK-17808
> URL: https://issues.apache.org/jira/browse/FLINK-17808
> Project: Flink
> Issue Type: Improvement
> Components: Runtime / Checkpointing
> Affects Versions: 1.10.0
> Reporter: Yun Tang
> Priority: Major
> Fix For: 1.14.0
>
>
> In practice, some developers or customers would use some strategy to find the
> recent _metadata as the checkpoint to recover (e.g as many proposals in
> FLINK-9043 suggest). However, there existed a "_meatadata" file does not mean
> the checkpoint have been completed as the writing to create the "_meatadata"
> file could break as some force quit (e.g. yarn application -kill).
> We could create the checkpoint meta stream to write data to file named as
> "_metadata.inprogress" and renamed it to "_metadata" once completed writing.
> By doing so, we could ensure the "_metadata" is not broken.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)