[ 
https://issues.apache.org/jira/browse/FLINK-11838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17279299#comment-17279299
 ] 

Xintong Song commented on FLINK-11838:
--------------------------------------

Hi [~galenwarren],

Thanks for offering the contribution. I will help you with this contribution.

Since this ticket has not been updated for quite some time and the original PR 
has been abandoned, I have assigned you to the ticket.

Just to managed expectation, I could use some time to pick up the GCS 
backgrounds and review your design proposal.

During this time, I would suggest to take a look at the following guidelines.
 [https://flink.apache.org/contributing/contribute-code.html]
 [https://flink.apache.org/contributing/code-style-and-quality-preamble.html]

After a first glance at the PR, I've two suggestions.
- I noticed you've described your proposal on the PR you've opened. It would be 
nice to update it to the description of this JIRA ticket. Usually, we use the 
JIRA ticket for design discussions, and the PR for reviewing implementation 
details.
- The PR contains 3k LOC changes, in a single commit, which could be hard to 
review, especially when we cannot communicate face-to-face. It would be nice to 
organize the codes into smaller commits following the contribution guidelines. 
This can be done after we reach consensus on the design proposal.

> Create RecoverableWriter for GCS
> --------------------------------
>
>                 Key: FLINK-11838
>                 URL: https://issues.apache.org/jira/browse/FLINK-11838
>             Project: Flink
>          Issue Type: Improvement
>          Components: Connectors / FileSystem
>    Affects Versions: 1.8.0
>            Reporter: Fokko Driesprong
>            Assignee: Galen Warren
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> GCS supports the resumable upload which we can use to create a Recoverable 
> writer similar to the S3 implementation:
> https://cloud.google.com/storage/docs/json_api/v1/how-tos/resumable-upload
> After using the Hadoop compatible interface: 
> https://github.com/apache/flink/pull/7519
> We've noticed that the current implementation relies heavily on the renaming 
> of the files on the commit: 
> https://github.com/apache/flink/blob/master/flink-filesystems/flink-hadoop-fs/src/main/java/org/apache/flink/runtime/fs/hdfs/HadoopRecoverableFsDataOutputStream.java#L233-L259
> This is suboptimal on an object store such as GCS. Therefore we would like to 
> implement a more GCS native RecoverableWriter 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to