Thank you, Aleksandr, for adding more to the proposal. Looking forward to collaborating on this project.
Best, Samrat On Mon, Jun 22, 2026 at 10:46 PM Aleksandr Iushmanov <[email protected]> wrote: > Hi Samrat, > > Thank you for working on this. I agree that the community would benefit > from introduction of the native filesystem implementation due to similar > motivation to the one raised in [1]. I am actively working on an "Umbrella" > FLIP for cross-clouds support and your proposal naturally fills in the gap > for GCS cloud. > > Speaking of pain points related to hadoop connectors, I would like to > mention: > 1. Complexity of CVE management. > 2. Challenges with dependency upgrades including Java version upgrades. > 3. Lack of support for client-side encryption with custom key providers > (especially in cross-cloud manner). > > I am looking forward to collaborating with you on hadoop-less flink file > systems support. > > [1] > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-555%3A+Flink+Native+S3+FileSystem > > Kind regards, > Alex > > > On Mon, 22 Jun 2026 at 06:44, Samrat Deb <[email protected]> wrote: > > > Hi all, > > > > Poorvank(cc'ed) & I would like to start a discussion about a potential > > improvement for Flink's > > Google Cloud Storage integration to create a native GCS filesystem > > independent of Hadoop. Earlier we were able do for s3 [1] > > > > The entire effort is to move forward to a Hadoop-free Flink Filesystem > and > > unlock potential performance benefits for Flink's focus requirements. > > > > The goal of this proposal is to explore whether Flink would benefit from > a > > first-class GCS filesystem implementation built directly on top of Google > > Cloud Storage client libraries rather than relying on the Hadoop > connector. > > If the discussion gains positive traction, the next step would be to > > prepare > > a formal FLIP. > > > > The Current State > > Today, Flink's GCS support is provided through flink-gs-fs-hadoop [2], > > which is based on Google's Cloud Storage Hadoop connector [3]. > > > > This approach has served Flink well, but it also introduces some > > limitations: > > > > 1. > > > > Flink's GCS integration depends on the Hadoop filesystem abstraction > and > > the Hadoop-based GCS connector. As a result, upgrades and feature > > adoption > > are tied to the evolution of those external components. > > 2. > > > > The dependency stack is larger than necessary for users who only > require > > Google Cloud Storage support. In practice, users must bring in > > Hadoop-based > > components even though the underlying storage system is an object > store. > > 3. > > > > Leveraging new capabilities from Google Cloud Storage often requires > > waiting for support to become available through the Hadoop connector > > before > > Flink can benefit from them. > > > > Proposed Direction > > > > I would like to explore the feasibility of a new filesystem > implementation, > > tentatively named flink-gs-fs-native, built directly on top of Google > Cloud > > Storage client libraries. > > > > The goals would be: > > > > 1. > > > > Provide a Hadoop-independent implementation of Flink's FileSystem API > > for > > Google Cloud Storage. > > 2. > > > > Reduce dependency complexity and make the GCS integration easier to > > maintain and evolve. > > 3. > > > > Allow Flink to adopt new Google Cloud Storage features and performance > > improvements directly, without depending on Hadoop abstractions. > > 4. > > > > Continue supporting Flink features such as checkpointing, savepoints, > > state backends, and file sinks through a native implementation. > > > > A Possible Migration Path > > > > To ensure a smooth transition, a phased approach could be considered: > > > > Phase 1: > > Introduce the native GCS filesystem as an optional plugin alongside the > > existing flink-gs-fs-hadoop connector. > > > > Phase 2: > > Gather community feedback, validate production readiness, and achieve > > feature parity with the existing implementation. > > > > Phase 3: > > If the native implementation proves mature and broadly adopted, discuss > > whether the Hadoop-based implementation should remain, be deprecated, or > > continue to coexist. > > > > Questions for the Community > > > > 1. > > > > What are the biggest pain points users face today with > > flink-gs-fs-hadoop? > > 2. > > > > Are there any critical capabilities provided by the Hadoop-based GCS > > connector that would be difficult or undesirable to reimplement? > > 3. > > > > Would a Hadoop-independent GCS filesystem provide meaningful value for > > your Flink deployments? > > 4. > > > > Are there specific GCS features or operational concerns that should be > > considered from the beginning? > > > > Looking forward to hearing the community's thoughts. > > > > Best, > > Samrat > > > > [1] > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-555%3A+Flink+Native+S3+FileSystem > > > > > > [2] > > > > > https://github.com/apache/flink/tree/master/flink-filesystems/flink-gs-fs-hadoop > > > > [3] > > https://github.com/GoogleCloudDataproc/hadoop-connectors/tree/master/gcs > > >
