[
https://issues.apache.org/jira/browse/FLINK-11838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17279299#comment-17279299
]
Xintong Song commented on FLINK-11838:
--------------------------------------
Hi [~galenwarren],
Thanks for offering the contribution. I will help you with this contribution.
Since this ticket has not been updated for quite some time and the original PR
has been abandoned, I have assigned you to the ticket.
Just to managed expectation, I could use some time to pick up the GCS
backgrounds and review your design proposal.
During this time, I would suggest to take a look at the following guidelines.
[https://flink.apache.org/contributing/contribute-code.html]
[https://flink.apache.org/contributing/code-style-and-quality-preamble.html]
After a first glance at the PR, I've two suggestions.
- I noticed you've described your proposal on the PR you've opened. It would be
nice to update it to the description of this JIRA ticket. Usually, we use the
JIRA ticket for design discussions, and the PR for reviewing implementation
details.
- The PR contains 3k LOC changes, in a single commit, which could be hard to
review, especially when we cannot communicate face-to-face. It would be nice to
organize the codes into smaller commits following the contribution guidelines.
This can be done after we reach consensus on the design proposal.
> Create RecoverableWriter for GCS
> --------------------------------
>
> Key: FLINK-11838
> URL: https://issues.apache.org/jira/browse/FLINK-11838
> Project: Flink
> Issue Type: Improvement
> Components: Connectors / FileSystem
> Affects Versions: 1.8.0
> Reporter: Fokko Driesprong
> Assignee: Galen Warren
> Priority: Major
> Labels: pull-request-available
> Time Spent: 20m
> Remaining Estimate: 0h
>
> GCS supports the resumable upload which we can use to create a Recoverable
> writer similar to the S3 implementation:
> https://cloud.google.com/storage/docs/json_api/v1/how-tos/resumable-upload
> After using the Hadoop compatible interface:
> https://github.com/apache/flink/pull/7519
> We've noticed that the current implementation relies heavily on the renaming
> of the files on the commit:
> https://github.com/apache/flink/blob/master/flink-filesystems/flink-hadoop-fs/src/main/java/org/apache/flink/runtime/fs/hdfs/HadoopRecoverableFsDataOutputStream.java#L233-L259
> This is suboptimal on an object store such as GCS. Therefore we would like to
> implement a more GCS native RecoverableWriter
--
This message was sent by Atlassian Jira
(v8.3.4#803005)