[jira] [Commented] (FLINK-19481) Add support for a flink native GCS FileSystem

Xintong Song (Jira) Wed, 05 May 2021 20:46:06 -0700


    [ 
https://issues.apache.org/jira/browse/FLINK-19481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17339981#comment-17339981
 ]


Xintong Song commented on FLINK-19481:
--------------------------------------

Thanks for the discussion.

I agree with Ben, that Galen's PR is mostly focusing on adding support for 
RecoverableWriter, and is not depending on the specific file system 
implementation. Migrating from hadoop to native gcs file system implementation 
should not require much reworking.

However, before deciding dong it, I'd like to understand the benefits of 
implementing a native gcs file system. Galen's PR does introduce a 
GSFileSystem, which simply wraps GoogleHadoopFileSystem. It seems to me this 
already solves most of the problems.
- "gs://" scheme can be supported
- Users no longer need to deal with the dependencies and FileSystem 
hierarchies. They should simply add the new flink-gs-fs-hadoop artifact, and 
ideally everything else needed should be included.

Is there any significant benefits that I overlooked, that can only be achieved 
by a native gcs file system implementation rather than the GSFileSystem in 
Galen's PR?

> Add support for a flink native GCS FileSystem
> ---------------------------------------------
>
>                 Key: FLINK-19481
>                 URL: https://issues.apache.org/jira/browse/FLINK-19481
>             Project: Flink
>          Issue Type: Improvement
>          Components: Connectors / FileSystem, FileSystems
>    Affects Versions: 1.12.0
>            Reporter: Ben Augarten
>            Priority: Minor
>              Labels: auto-deprioritized-major
>
> Currently, GCS is supported but only by using the hadoop connector[1]
>  
> The objective of this improvement is to add support for checkpointing to 
> Google Cloud Storage with the Flink File System,
>  
> This would allow the `gs://` scheme to be used for savepointing and 
> checkpointing. Long term, it would be nice if we could use the GCS FileSystem 
> as a source and sink in flink jobs as well. 
>  
> Long term, I hope that implementing a flink native GCS FileSystem will 
> simplify usage of GCS because the hadoop FileSystem ends up bringing in many 
> unshaded dependencies.
>  
> [1] 
> [https://github.com/GoogleCloudDataproc/hadoop-connectors|https://github.com/GoogleCloudDataproc/hadoop-connectors)]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (FLINK-19481) Add support for a flink native GCS FileSystem

Reply via email to