[
https://issues.apache.org/jira/browse/FLINK-9752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16618730#comment-16618730
]
Stephan Ewen commented on FLINK-9752:
-------------------------------------
There is a WIP branch
https://github.com/StephanEwen/incubator-flink/tree/s3_recoverable_writer_2
We are currently finalizing it. I think this feature is currently at the point
where the work does not yet parallelize well, but once the first version is
added, we will take all the help we can get to optimize it further.
> Add an S3 RecoverableWriter
> ---------------------------
>
> Key: FLINK-9752
> URL: https://issues.apache.org/jira/browse/FLINK-9752
> Project: Flink
> Issue Type: Sub-task
> Components: Streaming Connectors
> Reporter: Stephan Ewen
> Assignee: Kostas Kloudas
> Priority: Major
> Fix For: 1.7.0, 1.6.2
>
>
> S3 offers persistence only when uploads are complete. That means at the end
> of simple uploads and uploads of parts of a MultiPartUpload.
> We should implement a RecoverableWriter for S3 that does a MultiPartUpload
> with a Part per checkpoint.
> Recovering the reader needs the MultiPartUploadID and the list of ETags of
> previous parts.
> We need additional staging of data in Flink state to work around the fact that
> - Parts in a MultiPartUpload must be at least 5MB
> - Part sizes must be known up front. (Note that data can still be streamed
> in the upload)
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)