Hi Samrat,

Thank you for working on this. I agree that the community would benefit
from introduction of the native filesystem implementation due to similar
motivation to the one raised in [1]. I am actively working on an "Umbrella"
FLIP for cross-clouds support and your proposal naturally fills in the gap
for GCS cloud.

Speaking of pain points related to hadoop connectors, I would like to
mention:
1. Complexity of CVE management.
2. Challenges with dependency upgrades including Java version upgrades.
3. Lack of support for client-side encryption with custom key providers
(especially in cross-cloud manner).

I am looking forward to collaborating with you on hadoop-less flink file
systems support.

[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-555%3A+Flink+Native+S3+FileSystem

Kind regards,
Alex


On Mon, 22 Jun 2026 at 06:44, Samrat Deb <[email protected]> wrote:

> Hi all,
>
> Poorvank(cc'ed) & I would like to start a discussion about a potential
> improvement for Flink's
> Google Cloud Storage integration to create a native GCS filesystem
> independent of Hadoop. Earlier we were able do for s3 [1]
>
> The entire effort is to move forward to a Hadoop-free Flink Filesystem and
> unlock potential performance benefits for Flink's focus requirements.
>
> The goal of this proposal is to explore whether Flink would benefit from a
> first-class GCS filesystem implementation built directly on top of Google
> Cloud Storage client libraries rather than relying on the Hadoop connector.
> If the discussion gains positive traction, the next step would be to
> prepare
> a formal FLIP.
>
> The Current State
> Today, Flink's GCS support is provided through flink-gs-fs-hadoop [2],
> which is based on Google's Cloud Storage Hadoop connector [3].
>
> This approach has served Flink well, but it also introduces some
> limitations:
>
>    1.
>
>    Flink's GCS integration depends on the Hadoop filesystem abstraction and
>    the Hadoop-based GCS connector. As a result, upgrades and feature
>    adoption
>    are tied to the evolution of those external components.
>    2.
>
>    The dependency stack is larger than necessary for users who only require
>    Google Cloud Storage support. In practice, users must bring in
>    Hadoop-based
>    components even though the underlying storage system is an object store.
>    3.
>
>    Leveraging new capabilities from Google Cloud Storage often requires
>    waiting for support to become available through the Hadoop connector
>    before
>    Flink can benefit from them.
>
> Proposed Direction
>
> I would like to explore the feasibility of a new filesystem implementation,
> tentatively named flink-gs-fs-native, built directly on top of Google Cloud
> Storage client libraries.
>
> The goals would be:
>
>    1.
>
>    Provide a Hadoop-independent implementation of Flink's FileSystem API
> for
>    Google Cloud Storage.
>    2.
>
>    Reduce dependency complexity and make the GCS integration easier to
>    maintain and evolve.
>    3.
>
>    Allow Flink to adopt new Google Cloud Storage features and performance
>    improvements directly, without depending on Hadoop abstractions.
>    4.
>
>    Continue supporting Flink features such as checkpointing, savepoints,
>    state backends, and file sinks through a native implementation.
>
> A Possible Migration Path
>
> To ensure a smooth transition, a phased approach could be considered:
>
> Phase 1:
> Introduce the native GCS filesystem as an optional plugin alongside the
> existing flink-gs-fs-hadoop connector.
>
> Phase 2:
> Gather community feedback, validate production readiness, and achieve
> feature parity with the existing implementation.
>
> Phase 3:
> If the native implementation proves mature and broadly adopted, discuss
> whether the Hadoop-based implementation should remain, be deprecated, or
> continue to coexist.
>
> Questions for the Community
>
>    1.
>
>    What are the biggest pain points users face today with
>    flink-gs-fs-hadoop?
>    2.
>
>    Are there any critical capabilities provided by the Hadoop-based GCS
>    connector that would be difficult or undesirable to reimplement?
>    3.
>
>    Would a Hadoop-independent GCS filesystem provide meaningful value for
>    your Flink deployments?
>    4.
>
>    Are there specific GCS features or operational concerns that should be
>    considered from the beginning?
>
> Looking forward to hearing the community's thoughts.
>
> Best,
> Samrat
>
> [1]
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-555%3A+Flink+Native+S3+FileSystem
>
>
> [2]
>
> https://github.com/apache/flink/tree/master/flink-filesystems/flink-gs-fs-hadoop
>
> [3]
> https://github.com/GoogleCloudDataproc/hadoop-connectors/tree/master/gcs
>

Reply via email to