Re: [DISCUSSION] Native GCS Filesystem in Apache Flink

Samrat Deb Tue, 23 Jun 2026 01:46:54 -0700

Thank you, Aleksandr, for adding more to the proposal.
Looking forward to collaborating on this project.


Best,
Samrat

On Mon, Jun 22, 2026 at 10:46 PM Aleksandr Iushmanov <[email protected]>
wrote:

> Hi Samrat,
>
> Thank you for working on this. I agree that the community would benefit
> from introduction of the native filesystem implementation due to similar
> motivation to the one raised in [1]. I am actively working on an "Umbrella"
> FLIP for cross-clouds support and your proposal naturally fills in the gap
> for GCS cloud.
>
> Speaking of pain points related to hadoop connectors, I would like to
> mention:
> 1. Complexity of CVE management.
> 2. Challenges with dependency upgrades including Java version upgrades.
> 3. Lack of support for client-side encryption with custom key providers
> (especially in cross-cloud manner).
>
> I am looking forward to collaborating with you on hadoop-less flink file
> systems support.
>
> [1]
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-555%3A+Flink+Native+S3+FileSystem
>
> Kind regards,
> Alex
>
>
> On Mon, 22 Jun 2026 at 06:44, Samrat Deb <[email protected]> wrote:
>
> > Hi all,
> >
> > Poorvank(cc'ed) & I would like to start a discussion about a potential
> > improvement for Flink's
> > Google Cloud Storage integration to create a native GCS filesystem
> > independent of Hadoop. Earlier we were able do for s3 [1]
> >
> > The entire effort is to move forward to a Hadoop-free Flink Filesystem
> and
> > unlock potential performance benefits for Flink's focus requirements.
> >
> > The goal of this proposal is to explore whether Flink would benefit from
> a
> > first-class GCS filesystem implementation built directly on top of Google
> > Cloud Storage client libraries rather than relying on the Hadoop
> connector.
> > If the discussion gains positive traction, the next step would be to
> > prepare
> > a formal FLIP.
> >
> > The Current State
> > Today, Flink's GCS support is provided through flink-gs-fs-hadoop [2],
> > which is based on Google's Cloud Storage Hadoop connector [3].
> >
> > This approach has served Flink well, but it also introduces some
> > limitations:
> >
> >    1.
> >
> >    Flink's GCS integration depends on the Hadoop filesystem abstraction
> and
> >    the Hadoop-based GCS connector. As a result, upgrades and feature
> >    adoption
> >    are tied to the evolution of those external components.
> >    2.
> >
> >    The dependency stack is larger than necessary for users who only
> require
> >    Google Cloud Storage support. In practice, users must bring in
> >    Hadoop-based
> >    components even though the underlying storage system is an object
> store.
> >    3.
> >
> >    Leveraging new capabilities from Google Cloud Storage often requires
> >    waiting for support to become available through the Hadoop connector
> >    before
> >    Flink can benefit from them.
> >
> > Proposed Direction
> >
> > I would like to explore the feasibility of a new filesystem
> implementation,
> > tentatively named flink-gs-fs-native, built directly on top of Google
> Cloud
> > Storage client libraries.
> >
> > The goals would be:
> >
> >    1.
> >
> >    Provide a Hadoop-independent implementation of Flink's FileSystem API
> > for
> >    Google Cloud Storage.
> >    2.
> >
> >    Reduce dependency complexity and make the GCS integration easier to
> >    maintain and evolve.
> >    3.
> >
> >    Allow Flink to adopt new Google Cloud Storage features and performance
> >    improvements directly, without depending on Hadoop abstractions.
> >    4.
> >
> >    Continue supporting Flink features such as checkpointing, savepoints,
> >    state backends, and file sinks through a native implementation.
> >
> > A Possible Migration Path
> >
> > To ensure a smooth transition, a phased approach could be considered:
> >
> > Phase 1:
> > Introduce the native GCS filesystem as an optional plugin alongside the
> > existing flink-gs-fs-hadoop connector.
> >
> > Phase 2:
> > Gather community feedback, validate production readiness, and achieve
> > feature parity with the existing implementation.
> >
> > Phase 3:
> > If the native implementation proves mature and broadly adopted, discuss
> > whether the Hadoop-based implementation should remain, be deprecated, or
> > continue to coexist.
> >
> > Questions for the Community
> >
> >    1.
> >
> >    What are the biggest pain points users face today with
> >    flink-gs-fs-hadoop?
> >    2.
> >
> >    Are there any critical capabilities provided by the Hadoop-based GCS
> >    connector that would be difficult or undesirable to reimplement?
> >    3.
> >
> >    Would a Hadoop-independent GCS filesystem provide meaningful value for
> >    your Flink deployments?
> >    4.
> >
> >    Are there specific GCS features or operational concerns that should be
> >    considered from the beginning?
> >
> > Looking forward to hearing the community's thoughts.
> >
> > Best,
> > Samrat
> >
> > [1]
> >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-555%3A+Flink+Native+S3+FileSystem
> >
> >
> > [2]
> >
> >
> https://github.com/apache/flink/tree/master/flink-filesystems/flink-gs-fs-hadoop
> >
> > [3]
> > https://github.com/GoogleCloudDataproc/hadoop-connectors/tree/master/gcs
> >
>

Re: [DISCUSSION] Native GCS Filesystem in Apache Flink

Reply via email to