[jira] [Commented] (FLINK-19481) Add support for a flink native GCS FileSystem

Galen Warren (Jira) Mon, 03 May 2021 08:19:22 -0700


    [ 
https://issues.apache.org/jira/browse/FLINK-19481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17338418#comment-17338418
 ]


Galen Warren commented on FLINK-19481:
--------------------------------------

Hi all, I'm the author of the other 
[PR|https://github.com/apache/flink/pull/15599] that relates to Google Cloud 
Storage. [~xintongsong] has been working with me on this.

The main goal of my PR is to add support for the RecoverableWriter interface, 
so that one can write to GCS via a StreamingFileSink. The file system support 
goes through the Hadoop stack, as noted above, using Google's [cloud storage 
connector|https://cloud.google.com/dataproc/docs/concepts/connectors/cloud-storage].

I have not personally had problems using the GCS connector and the Hadoop stack 
– it seems to write check/savepoints properly. I also use it to write job 
manager HA data to GCS, which seems to work fine.

However, if we do want to support a native implementation in addition to the 
Hadoop-based one, we could approach it similarly to what has been done for S3, 
i.e. have a shared base project (flink-gs-fs-base?) and then projects for each 
of the implementations ( flink-gs-fs-hadoop and flink-gs-fs-native?). The 
recoverable-writer code could go into the shared project so that both of the 
implementations could use it (assuming that the native implementation doesn't 
already have a recoverable-writer implementation).

I'll defer to the Flink experts on whether that's a worthwhile effort or not. 
At this point, from my perspective, it wouldn't be that much work to rework the 
project structure to support this.

 

> Add support for a flink native GCS FileSystem
> ---------------------------------------------
>
>                 Key: FLINK-19481
>                 URL: https://issues.apache.org/jira/browse/FLINK-19481
>             Project: Flink
>          Issue Type: Improvement
>          Components: Connectors / FileSystem, FileSystems
>    Affects Versions: 1.12.0
>            Reporter: Ben Augarten
>            Priority: Minor
>              Labels: auto-deprioritized-major
>
> Currently, GCS is supported but only by using the hadoop connector[1]
>  
> The objective of this improvement is to add support for checkpointing to 
> Google Cloud Storage with the Flink File System,
>  
> This would allow the `gs://` scheme to be used for savepointing and 
> checkpointing. Long term, it would be nice if we could use the GCS FileSystem 
> as a source and sink in flink jobs as well. 
>  
> Long term, I hope that implementing a flink native GCS FileSystem will 
> simplify usage of GCS because the hadoop FileSystem ends up bringing in many 
> unshaded dependencies.
>  
> [1] 
> [https://github.com/GoogleCloudDataproc/hadoop-connectors|https://github.com/GoogleCloudDataproc/hadoop-connectors)]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (FLINK-19481) Add support for a flink native GCS FileSystem

Reply via email to