Hi Samrat, Thank you for working on this. I agree that the community would benefit from introduction of the native filesystem implementation due to similar motivation to the one raised in [1]. I am actively working on an "Umbrella" FLIP for cross-clouds support and your proposal naturally fills in the gap for GCS cloud.
Speaking of pain points related to hadoop connectors, I would like to mention: 1. Complexity of CVE management. 2. Challenges with dependency upgrades including Java version upgrades. 3. Lack of support for client-side encryption with custom key providers (especially in cross-cloud manner). I am looking forward to collaborating with you on hadoop-less flink file systems support. [1] https://cwiki.apache.org/confluence/display/FLINK/FLIP-555%3A+Flink+Native+S3+FileSystem Kind regards, Alex On Mon, 22 Jun 2026 at 06:44, Samrat Deb <[email protected]> wrote: > Hi all, > > Poorvank(cc'ed) & I would like to start a discussion about a potential > improvement for Flink's > Google Cloud Storage integration to create a native GCS filesystem > independent of Hadoop. Earlier we were able do for s3 [1] > > The entire effort is to move forward to a Hadoop-free Flink Filesystem and > unlock potential performance benefits for Flink's focus requirements. > > The goal of this proposal is to explore whether Flink would benefit from a > first-class GCS filesystem implementation built directly on top of Google > Cloud Storage client libraries rather than relying on the Hadoop connector. > If the discussion gains positive traction, the next step would be to > prepare > a formal FLIP. > > The Current State > Today, Flink's GCS support is provided through flink-gs-fs-hadoop [2], > which is based on Google's Cloud Storage Hadoop connector [3]. > > This approach has served Flink well, but it also introduces some > limitations: > > 1. > > Flink's GCS integration depends on the Hadoop filesystem abstraction and > the Hadoop-based GCS connector. As a result, upgrades and feature > adoption > are tied to the evolution of those external components. > 2. > > The dependency stack is larger than necessary for users who only require > Google Cloud Storage support. In practice, users must bring in > Hadoop-based > components even though the underlying storage system is an object store. > 3. > > Leveraging new capabilities from Google Cloud Storage often requires > waiting for support to become available through the Hadoop connector > before > Flink can benefit from them. > > Proposed Direction > > I would like to explore the feasibility of a new filesystem implementation, > tentatively named flink-gs-fs-native, built directly on top of Google Cloud > Storage client libraries. > > The goals would be: > > 1. > > Provide a Hadoop-independent implementation of Flink's FileSystem API > for > Google Cloud Storage. > 2. > > Reduce dependency complexity and make the GCS integration easier to > maintain and evolve. > 3. > > Allow Flink to adopt new Google Cloud Storage features and performance > improvements directly, without depending on Hadoop abstractions. > 4. > > Continue supporting Flink features such as checkpointing, savepoints, > state backends, and file sinks through a native implementation. > > A Possible Migration Path > > To ensure a smooth transition, a phased approach could be considered: > > Phase 1: > Introduce the native GCS filesystem as an optional plugin alongside the > existing flink-gs-fs-hadoop connector. > > Phase 2: > Gather community feedback, validate production readiness, and achieve > feature parity with the existing implementation. > > Phase 3: > If the native implementation proves mature and broadly adopted, discuss > whether the Hadoop-based implementation should remain, be deprecated, or > continue to coexist. > > Questions for the Community > > 1. > > What are the biggest pain points users face today with > flink-gs-fs-hadoop? > 2. > > Are there any critical capabilities provided by the Hadoop-based GCS > connector that would be difficult or undesirable to reimplement? > 3. > > Would a Hadoop-independent GCS filesystem provide meaningful value for > your Flink deployments? > 4. > > Are there specific GCS features or operational concerns that should be > considered from the beginning? > > Looking forward to hearing the community's thoughts. > > Best, > Samrat > > [1] > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-555%3A+Flink+Native+S3+FileSystem > > > [2] > > https://github.com/apache/flink/tree/master/flink-filesystems/flink-gs-fs-hadoop > > [3] > https://github.com/GoogleCloudDataproc/hadoop-connectors/tree/master/gcs >
