[
https://issues.apache.org/jira/browse/FLINK-19481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17339981#comment-17339981
]
Xintong Song commented on FLINK-19481:
--------------------------------------
Thanks for the discussion.
I agree with Ben, that Galen's PR is mostly focusing on adding support for
RecoverableWriter, and is not depending on the specific file system
implementation. Migrating from hadoop to native gcs file system implementation
should not require much reworking.
However, before deciding dong it, I'd like to understand the benefits of
implementing a native gcs file system. Galen's PR does introduce a
GSFileSystem, which simply wraps GoogleHadoopFileSystem. It seems to me this
already solves most of the problems.
- "gs://" scheme can be supported
- Users no longer need to deal with the dependencies and FileSystem
hierarchies. They should simply add the new flink-gs-fs-hadoop artifact, and
ideally everything else needed should be included.
Is there any significant benefits that I overlooked, that can only be achieved
by a native gcs file system implementation rather than the GSFileSystem in
Galen's PR?
> Add support for a flink native GCS FileSystem
> ---------------------------------------------
>
> Key: FLINK-19481
> URL: https://issues.apache.org/jira/browse/FLINK-19481
> Project: Flink
> Issue Type: Improvement
> Components: Connectors / FileSystem, FileSystems
> Affects Versions: 1.12.0
> Reporter: Ben Augarten
> Priority: Minor
> Labels: auto-deprioritized-major
>
> Currently, GCS is supported but only by using the hadoop connector[1]
>
> The objective of this improvement is to add support for checkpointing to
> Google Cloud Storage with the Flink File System,
>
> This would allow the `gs://` scheme to be used for savepointing and
> checkpointing. Long term, it would be nice if we could use the GCS FileSystem
> as a source and sink in flink jobs as well.
>
> Long term, I hope that implementing a flink native GCS FileSystem will
> simplify usage of GCS because the hadoop FileSystem ends up bringing in many
> unshaded dependencies.
>
> [1]
> [https://github.com/GoogleCloudDataproc/hadoop-connectors|https://github.com/GoogleCloudDataproc/hadoop-connectors)]
--
This message was sent by Atlassian Jira
(v8.3.4#803005)