[ 
https://issues.apache.org/jira/browse/FLINK-11838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17280730#comment-17280730
 ] 

Xintong Song commented on FLINK-11838:
--------------------------------------

Thanks [~galenwarren].

Your plans regarding the temporary blobs and the resumable upload time limit 
sound good to me.

Concerning the serialization issue, I'm not saying we should go for the REST 
API approach. Just trying to understand what options do we have and their pros 
and cons. TBH, I'm not entirely sure why we had this code quality rule against 
java serialization in the first place. I can see some downsides using java 
serialization, which might not be a big problem in this case. But it would be 
better to understand the original purpose, in case we overlook something.

I'll try to talk to some veterans see if we can find out what's the original 
concerns. I think investigating the REST approach can go concurrently.

Regarding the HTTP/REST client, Netty {{HttpRequest}} is used by some of 
Flink's runtime components. Please be aware that {{flink-shaded-netty}} is 
preferred than depending on Netty directly.

> Create RecoverableWriter for GCS
> --------------------------------
>
>                 Key: FLINK-11838
>                 URL: https://issues.apache.org/jira/browse/FLINK-11838
>             Project: Flink
>          Issue Type: New Feature
>          Components: Connectors / FileSystem
>    Affects Versions: 1.8.0
>            Reporter: Fokko Driesprong
>            Assignee: Galen Warren
>            Priority: Major
>              Labels: pull-request-available, usability
>             Fix For: 1.13.0
>
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> GCS supports the resumable upload which we can use to create a Recoverable 
> writer similar to the S3 implementation:
> https://cloud.google.com/storage/docs/json_api/v1/how-tos/resumable-upload
> After using the Hadoop compatible interface: 
> https://github.com/apache/flink/pull/7519
> We've noticed that the current implementation relies heavily on the renaming 
> of the files on the commit: 
> https://github.com/apache/flink/blob/master/flink-filesystems/flink-hadoop-fs/src/main/java/org/apache/flink/runtime/fs/hdfs/HadoopRecoverableFsDataOutputStream.java#L233-L259
> This is suboptimal on an object store such as GCS. Therefore we would like to 
> implement a more GCS native RecoverableWriter 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to