Re: [DISCUSSION] Native GCS Filesystem in Apache Flink

Ryan van Huuksloot via dev Tue, 23 Jun 2026 06:45:33 -0700

Hello,

I wanted to jump in and say that I think this is a great effort. We've had
many issues with Hadoop being a dependency.


Given our other priorities at Shopify, we don't have time to contribute in
2026. However, when an alpha release is available, we would be happy to run
it against our system.

Thanks,
Ryan van Huuksloot
Staff Engineer, Infrastructure | Streaming Platform
[image: Shopify]
<https://www.shopify.com/?utm_medium=salessignatures&utm_source=hs_email>


On Tue, Jun 23, 2026 at 4:46 AM Samrat Deb <[email protected]> wrote:

> Thank you, Aleksandr, for adding more to the proposal.
> Looking forward to collaborating on this project.
>
> Best,
> Samrat
>
> On Mon, Jun 22, 2026 at 10:46 PM Aleksandr Iushmanov <[email protected]>
> wrote:
>
> > Hi Samrat,
> >
> > Thank you for working on this. I agree that the community would benefit
> > from introduction of the native filesystem implementation due to similar
> > motivation to the one raised in [1]. I am actively working on an
> "Umbrella"
> > FLIP for cross-clouds support and your proposal naturally fills in the
> gap
> > for GCS cloud.
> >
> > Speaking of pain points related to hadoop connectors, I would like to
> > mention:
> > 1. Complexity of CVE management.
> > 2. Challenges with dependency upgrades including Java version upgrades.
> > 3. Lack of support for client-side encryption with custom key providers
> > (especially in cross-cloud manner).
> >
> > I am looking forward to collaborating with you on hadoop-less flink file
> > systems support.
> >
> > [1]
> >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-555%3A+Flink+Native+S3+FileSystem
> >
> > Kind regards,
> > Alex
> >
> >
> > On Mon, 22 Jun 2026 at 06:44, Samrat Deb <[email protected]> wrote:
> >
> > > Hi all,
> > >
> > > Poorvank(cc'ed) & I would like to start a discussion about a potential
> > > improvement for Flink's
> > > Google Cloud Storage integration to create a native GCS filesystem
> > > independent of Hadoop. Earlier we were able do for s3 [1]
> > >
> > > The entire effort is to move forward to a Hadoop-free Flink Filesystem
> > and
> > > unlock potential performance benefits for Flink's focus requirements.
> > >
> > > The goal of this proposal is to explore whether Flink would benefit
> from
> > a
> > > first-class GCS filesystem implementation built directly on top of
> Google
> > > Cloud Storage client libraries rather than relying on the Hadoop
> > connector.
> > > If the discussion gains positive traction, the next step would be to
> > > prepare
> > > a formal FLIP.
> > >
> > > The Current State
> > > Today, Flink's GCS support is provided through flink-gs-fs-hadoop [2],
> > > which is based on Google's Cloud Storage Hadoop connector [3].
> > >
> > > This approach has served Flink well, but it also introduces some
> > > limitations:
> > >
> > >    1.
> > >
> > >    Flink's GCS integration depends on the Hadoop filesystem abstraction
> > and
> > >    the Hadoop-based GCS connector. As a result, upgrades and feature
> > >    adoption
> > >    are tied to the evolution of those external components.
> > >    2.
> > >
> > >    The dependency stack is larger than necessary for users who only
> > require
> > >    Google Cloud Storage support. In practice, users must bring in
> > >    Hadoop-based
> > >    components even though the underlying storage system is an object
> > store.
> > >    3.
> > >
> > >    Leveraging new capabilities from Google Cloud Storage often requires
> > >    waiting for support to become available through the Hadoop connector
> > >    before
> > >    Flink can benefit from them.
> > >
> > > Proposed Direction
> > >
> > > I would like to explore the feasibility of a new filesystem
> > implementation,
> > > tentatively named flink-gs-fs-native, built directly on top of Google
> > Cloud
> > > Storage client libraries.
> > >
> > > The goals would be:
> > >
> > >    1.
> > >
> > >    Provide a Hadoop-independent implementation of Flink's FileSystem
> API
> > > for
> > >    Google Cloud Storage.
> > >    2.
> > >
> > >    Reduce dependency complexity and make the GCS integration easier to
> > >    maintain and evolve.
> > >    3.
> > >
> > >    Allow Flink to adopt new Google Cloud Storage features and
> performance
> > >    improvements directly, without depending on Hadoop abstractions.
> > >    4.
> > >
> > >    Continue supporting Flink features such as checkpointing,
> savepoints,
> > >    state backends, and file sinks through a native implementation.
> > >
> > > A Possible Migration Path
> > >
> > > To ensure a smooth transition, a phased approach could be considered:
> > >
> > > Phase 1:
> > > Introduce the native GCS filesystem as an optional plugin alongside the
> > > existing flink-gs-fs-hadoop connector.
> > >
> > > Phase 2:
> > > Gather community feedback, validate production readiness, and achieve
> > > feature parity with the existing implementation.
> > >
> > > Phase 3:
> > > If the native implementation proves mature and broadly adopted, discuss
> > > whether the Hadoop-based implementation should remain, be deprecated,
> or
> > > continue to coexist.
> > >
> > > Questions for the Community
> > >
> > >    1.
> > >
> > >    What are the biggest pain points users face today with
> > >    flink-gs-fs-hadoop?
> > >    2.
> > >
> > >    Are there any critical capabilities provided by the Hadoop-based GCS
> > >    connector that would be difficult or undesirable to reimplement?
> > >    3.
> > >
> > >    Would a Hadoop-independent GCS filesystem provide meaningful value
> for
> > >    your Flink deployments?
> > >    4.
> > >
> > >    Are there specific GCS features or operational concerns that should
> be
> > >    considered from the beginning?
> > >
> > > Looking forward to hearing the community's thoughts.
> > >
> > > Best,
> > > Samrat
> > >
> > > [1]
> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-555%3A+Flink+Native+S3+FileSystem
> > >
> > >
> > > [2]
> > >
> > >
> >
> https://github.com/apache/flink/tree/master/flink-filesystems/flink-gs-fs-hadoop
> > >
> > > [3]
> > >
> https://github.com/GoogleCloudDataproc/hadoop-connectors/tree/master/gcs
> > >
> >
>

Re: [DISCUSSION] Native GCS Filesystem in Apache Flink

Reply via email to