[
https://issues.apache.org/jira/browse/FLINK-19481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17338418#comment-17338418
]
Galen Warren commented on FLINK-19481:
--------------------------------------
Hi all, I'm the author of the other
[PR|https://github.com/apache/flink/pull/15599] that relates to Google Cloud
Storage. [~xintongsong] has been working with me on this.
The main goal of my PR is to add support for the RecoverableWriter interface,
so that one can write to GCS via a StreamingFileSink. The file system support
goes through the Hadoop stack, as noted above, using Google's [cloud storage
connector|https://cloud.google.com/dataproc/docs/concepts/connectors/cloud-storage].
I have not personally had problems using the GCS connector and the Hadoop stack
– it seems to write check/savepoints properly. I also use it to write job
manager HA data to GCS, which seems to work fine.
However, if we do want to support a native implementation in addition to the
Hadoop-based one, we could approach it similarly to what has been done for S3,
i.e. have a shared base project (flink-gs-fs-base?) and then projects for each
of the implementations ( flink-gs-fs-hadoop and flink-gs-fs-native?). The
recoverable-writer code could go into the shared project so that both of the
implementations could use it (assuming that the native implementation doesn't
already have a recoverable-writer implementation).
I'll defer to the Flink experts on whether that's a worthwhile effort or not.
At this point, from my perspective, it wouldn't be that much work to rework the
project structure to support this.
> Add support for a flink native GCS FileSystem
> ---------------------------------------------
>
> Key: FLINK-19481
> URL: https://issues.apache.org/jira/browse/FLINK-19481
> Project: Flink
> Issue Type: Improvement
> Components: Connectors / FileSystem, FileSystems
> Affects Versions: 1.12.0
> Reporter: Ben Augarten
> Priority: Minor
> Labels: auto-deprioritized-major
>
> Currently, GCS is supported but only by using the hadoop connector[1]
>
> The objective of this improvement is to add support for checkpointing to
> Google Cloud Storage with the Flink File System,
>
> This would allow the `gs://` scheme to be used for savepointing and
> checkpointing. Long term, it would be nice if we could use the GCS FileSystem
> as a source and sink in flink jobs as well.
>
> Long term, I hope that implementing a flink native GCS FileSystem will
> simplify usage of GCS because the hadoop FileSystem ends up bringing in many
> unshaded dependencies.
>
> [1]
> [https://github.com/GoogleCloudDataproc/hadoop-connectors|https://github.com/GoogleCloudDataproc/hadoop-connectors)]
--
This message was sent by Atlassian Jira
(v8.3.4#803005)