Re: [DISCUSSION] Native S3 Filesystem in Apache Flink

Gabor Somogyi Wed, 05 Nov 2025 04:04:24 -0800

Hi Samrat,

Thanks for the contribution! I've had a slight look at the code which is
promising.


I've a couple of questions/remarks:
- A migration guide would be excellent from the old connectors. That way
users can see how much effort it is.
- One of the key points from operational perspective is to have a way to
make IOPS usage
configurable. As on oversimplified explanation just to get a taste this can
be kept under control in 2 ways and places:
  1. In Hadoop s3a set `fs.s3a.limit.total`
  2. In connector set `s3.multipart.upload.min.file.size` and
`s3.multipart.upload.min.part.size`
Do I understand it correctly that this is intended to be covered by the
following configs?

| s3.upload.min.part.size | 5242880 | Minimum part size for multipart
uploads (5MB) |
| s3.upload.max.concurrent.uploads | CPU cores | Maximum concurrent uploads
per stream |

> I am now drafting a formal benchmark plan based on these specifics and
will share it with this thread in the coming days for feedback.
Waiting for the details.

BR,
G


On Wed, Nov 5, 2025 at 7:08 AM Samrat Deb <[email protected]> wrote:

> Hi all,
>
> I have a working POC for the Native S3 filesystem, which is now available
> as a draft PR [1].
> The POC is functional and has been validated in a local setup with Minio.
> It's important to note that it does not yet have complete test coverage.
>
> The immediate next step is to conduct a comprehensive benchmark to compare
> its performance against the existing `flink-s3-fs-hadoop` and
> `flink-s3-fs-presto` implementations.
>
> I've had a very meaningful discussion with Piotr Nowojski about this
> offline. I am grateful for his detailed guidance on defining a rigorous
> benchmarking strategy, including specific cluster configurations, job
> workloads, and key metrics for evaluating both checkpoint/recovery
> performance and pure throughput.
> I am now drafting a formal benchmark plan based on these specifics and will
> share it with this thread in the coming days for feedback.
>
> Cheers,
> Samrat
>
>  [1] https://github.com/apache/flink/pull/27187
>
> On Wed, Oct 29, 2025 at 9:31 PM Samrat Deb <[email protected]> wrote:
>
> > thank you  Martijn for clarifying .
> > i will proceed with creating a task.
> >
> > Thanks Mate for the pointer to Minio for testing.
> > minio is good to use for testing .
> >
> >
> > Cheers,
> > Samrat
> >
> >
> > On Mon, 27 Oct 2025 at 11:55 PM, Mate Czagany <[email protected]>
> wrote:
> >
> >> Hi,
> >>
> >> Just to add to the MinIO licensing concerns, I could not see any recent
> >> change to the license itself, they have changed the license from Apache
> >> 2.0
> >> to AGPL-3.0 in 2021, and the Docker image used by the tests (which is
> from
> >> 2022) already contains the AGPL-3.0 license. This should not be an issue
> >> as
> >> Flink does not distribute nor makes MinIO available over the network,
> it's
> >> only used by the tests.
> >>
> >> What's changed recently is that MinIO no longer publishes Docker images
> to
> >> the public [1], so it might be worth it to look into using alternative
> >> solutions in the future, e.g. Garage [2].
> >>
> >> Best regards,
> >> Mate
> >>
> >> [1] https://github.com/minio/minio/issues/21647#issuecomment-3418675115
> >> [2] https://garagehq.deuxfleurs.fr/
> >>
> >> On Mon, Oct 27, 2025 at 5:48 PM Ferenc Csaky <[email protected]
> >
> >> wrote:
> >>
> >> > Hi,
> >> >
> >> > Really nice to see people chime into this thread. I agree with Martijn
> >> > about the
> >> > development approach. There will be some iterations until we can
> >> stabilize
> >> > this anyways,
> >> > so we can try to shoot getting out a good enough MVP, then fix issues
> +
> >> > reach feature
> >> > parity with the existing implementations on the go.
> >> >
> >> > I am not a licensing expert but AFAIK the previous images that were
> >> > released under the
> >> > acceptable license can be continued to use. For most integration
> tests,
> >> we
> >> > use an
> >> > ancient image anyways [1]. There is another place where the latest img
> >> > gets pulled [2],
> >> > I guess it would be good to apply an explicit that tag there. But
> AFAIK
> >> > they stop
> >> > publishing to Docker Hub, so I would anticipate we cannot end up
> pulling
> >> > an image with
> >> > a forbidden license.
> >> >
> >> > Best,
> >> > Ferenc
> >> >
> >> > [1]
> >> >
> >>
> https://github.com/apache/flink/blob/fd1a97768b661f19783afe70d93a0a8d3d625b2a/flink-test-utils-parent/flink-test-utils-junit/src/main/java/org/apache/flink/util/DockerImageVersions.java#L39
> >> > [2]
> >> >
> >>
> https://github.com/apache/flink/blob/fd1a97768b661f19783afe70d93a0a8d3d625b2a/flink-end-to-end-tests/test-scripts/common_s3_minio.sh#L51
> >> >
> >> >
> >> >
> >> >
> >> > On Sunday, October 26th, 2025 at 22:05, Martijn Visser <
> >> > [email protected]> wrote:
> >> >
> >> > >
> >> > >
> >> > > Hi Samrat,
> >> > >
> >> > > First of all, thanks for the proposal. It's long overdue to get this
> >> in a
> >> > > better state.
> >> > >
> >> > > With regards to the schemes, I would say to ship an initial release
> >> that
> >> > > does not include support for s3a and s3p, and focus first on getting
> >> this
> >> > > new implementation into a stable state. When that's done, as a
> >> follow-up,
> >> > > we can consider adding support for s3a and s3p on this
> implementation,
> >> > and
> >> > > when that's there consider deprecating the older implementations. It
> >> will
> >> > > probably take multiple releases before we have this in a stable
> state.
> >> > >
> >> > > Not directly related to this, but given that MinIO decided to change
> >> > their
> >> > > license, do we also need to refactor existing tests to not use MinIO
> >> > > anymore but something else?
> >> > >
> >> > > Thanks,
> >> > >
> >> > > Martijn
> >> > >
> >> > > On Sat, Oct 25, 2025 at 1:38 AM Samrat Deb [email protected]
> >> wrote:
> >> > >
> >> > > > Hi all,
> >> > > >
> >> > > > One clarifying question regarding the URI schemes:
> >> > > >
> >> > > > Currently, the Flink ecosystem uses multiple schemes to
> >> differentiate
> >> > > > between S3 implementations: s3a:// for the Hadoop-based connector
> >> and
> >> > > > s3p://[1] for the Presto-based one, which is often recommended for
> >> > > > checkpointing.
> >> > > >
> >> > > > A key goal of the proposed flink-s3-fs-native is to unify these
> >> into a
> >> > > > single implementation. With that in mind, what should be the
> >> strategy
> >> > for
> >> > > > scheme support? Should the new native s3 filesystem register only
> >> for
> >> > the
> >> > > > simple s3:// scheme, aiming to deprecate the others? Or would it
> be
> >> > > > beneficial to also support s3a:// and s3p:// to provide a smoother
> >> > > > migration path for users who may have these schemes in their
> >> existing
> >> > job
> >> > > > configurations?
> >> > > > Cheers,
> >> > > > Samrat
> >> > > >
> >> > > > [1] https://github.com/generalui/s3p
> >> > > >
> >> > > > On Wed, Oct 22, 2025 at 6:31 PM Piotr Nowojski
> [email protected]
> >> > > > wrote:
> >> > > >
> >> > > > > Hi Samrat,
> >> > > > >
> >> > > > > > 1. Even if the specifics are hazy, could you recall the
> general
> >> > > > > > nature of those concerns? For instance, were they related to
> >> S3's
> >> > > > > > eventual
> >> > > > > > consistency model, which has since improved, the atomicity of
> >> > Multipart
> >> > > > > > Upload commits, or perhaps complex failure/recovery scenarios
> >> > during
> >> > > > > > the
> >> > > > > > commit phase?
> >> > > > >
> >> > > > > and
> >> > > > >
> >> > > > > > *8. *The flink-s3-fs-presto connector explicitly throws an
> >> > > > > > `UnsupportedOperationException` when
> >> `createRecoverableWriter()` is
> >> > > > > > called.
> >> > > > > > Was this a deliberate design choice to keep the Presto
> connector
> >> > > > > > lightweight and optimized specifically for checkpointing, or
> >> were
> >> > there
> >> > > > > > other technical challenges that prevented its implementation
> at
> >> the
> >> > > > > > time?
> >> > > > > > Any context on this would be very helpful
> >> > > > >
> >> > > > > I very vaguely remember that at least one of those concerns was
> >> with
> >> > > > > respect to how long
> >> > > > > does it take for the S3 to make some certain operations visible.
> >> > That you
> >> > > > > think you have
> >> > > > > uploaded and committed a file, but in reality it might not be
> >> > visible for
> >> > > > > tens of seconds.
> >> > > > >
> >> > > > > Sorry, I don't remember more (or even if there was more). I was
> >> only
> >> > > > > superficially involved
> >> > > > > in the S3 connector back then - just participated/overheard some
> >> > > > > discussions.
> >> > > > >
> >> > > > > > 2. It's clear that implementing an efficient
> >> > > > > > PathsCopyingFileSystem[2]
> >> > > > > > is
> >> > > > > > a non-negotiable requirement for performance. Is there any
> >> > benchmark
> >> > > > > > numbers available that can be used as reference and evaluate
> new
> >> > > > > > implementation deviation ?
> >> > > > >
> >> > > > > I only have the numbers that I put in the original Flip [1]. I
> >> don't
> >> > > > > remember the benchmark
> >> > > > > setup, but it must have been something simple. Like just let
> some
> >> job
> >> > > > > accumulate 1GB of state
> >> > > > > and measure how long the state downloading phase of recovery was
> >> > taking.
> >> > > > >
> >> > > > > > 3. Do you recall the workload characteristics for that PoC?
> >> > > > > > Specifically,
> >> > > > > > was the 30-40% performance advantage of s5cmd observed when
> >> copying
> >> > > > > > many
> >> > > > > > small files (like checkpoint state) or larger, multi-gigabyte
> >> > files?
> >> > > > >
> >> > > > > It was just a regular mix of compacted RocksDB sst files, with
> >> total
> >> > > > > state
> >> > > > > size 1 or at most
> >> > > > > a couple of GBs. So most of the files were around ~64MB or
> ~128MB,
> >> > with a
> >> > > > > couple of
> >> > > > > smaller L0 files, and maybe one larger L2 file.
> >> > > > >
> >> > > > > > 4. The idea of a switchable implementation sounds great. Would
> >> you
> >> > > > > > envision this as a configuration flag (e.g.,
> >> > > > > > s3.native.copy.strategy=s5cmd
> >> > > > > > or s3.native.copy.strategy=sdk) that selects the backend
> >> > implementation
> >> > > > > > at
> >> > > > > > runtime? Also on contrary is it worth adding configuration
> that
> >> > exposes
> >> > > > > > some level of implementation level information ?
> >> > > > >
> >> > > > > I think something like that should be fine, assuming that
> `s5cmd`
> >> > will
> >> > > > > again
> >> > > > > prove significantly faster and/or more cpu efficient. If not, if
> >> the
> >> > > > > SDKv2
> >> > > > > has
> >> > > > > already improved and caught up with the `s5cmd`, then it
> probably
> >> > doesn't
> >> > > > > make sense to keep `s5cmd` support.
> >> > > > >
> >> > > > > > 5. My understanding is that the key takeaway here is to avoid
> >> the
> >> > > > > > file-by-file stream-based copy used in the vanilla connector
> and
> >> > > > > > leverage
> >> > > > > > bulk operations, which PathsCopyingFileSystem[2] enables. This
> >> > seems
> >> > > > > > most
> >> > > > > > critical during state download on recovery. please suggest if
> my
> >> > > > > > inference
> >> > > > > > is in right direction
> >> > > > >
> >> > > > > Yes, but you should also make the bult transfer configurable.
> How
> >> > many
> >> > > > > bulk
> >> > > > > transfers
> >> > > > > can be happening in parallel etc.
> >> > > > >
> >> > > > > > 6. The warning about `s5cmd` causing OOMs sounds like
> >> indication to
> >> > > > > > consider `S3TransferManager`[3] implementation, which might
> >> offer
> >> > more
> >> > > > > > granular control over buffering and in-flight requests. Do you
> >> > think
> >> > > > > > exploring more on `S3TransferManager` would be valuable ?
> >> > > > >
> >> > > > > I'm pretty sure if you start hundreds of bulk transfers in
> >> parallel
> >> > via
> >> > > > > the
> >> > > > > `S3TransferManager` you can get the same problems with running
> >> out of
> >> > > > > memory or exceeding available network throughput. I don't know
> if
> >> > > > > `S3TransferManager` is better or worse in that regard to be
> >> honest.
> >> > > > >
> >> > > > > > 7. The insight on AWS aggressively dropping packets instead of
> >> > > > > > gracefully
> >> > > > > > throttling is invaluable. Currently i have limited
> understanding
> >> > on how
> >> > > > > > aws
> >> > > > > > behaves at throttling I will deep dive more into it and
> >> > > > > > look for clarification based on findings or doubt. To counter
> >> this,
> >> > > > > > were
> >> > > > > > you thinking of a configurable rate limiter within the
> >> filesystem
> >> > > > > > itself
> >> > > > > > (e.g., setting max bandwidth or max concurrent requests), or
> >> > something
> >> > > > > > more
> >> > > > > > dynamic that could adapt to network conditions?
> >> > > > >
> >> > > > > Flat rate limiting is tricky because AWS offers burst network
> >> > capacity,
> >> > > > > which
> >> > > > > comes very handy, and in the vast majority of cases works fine.
> >> But
> >> > for
> >> > > > > some jobs
> >> > > > > if you exceed that burst capacity, AWS starts dropping your
> >> packets
> >> > and
> >> > > > > then the
> >> > > > > problems happen. On the other hand, if rate limit to your normal
> >> > > > > capacity,
> >> > > > > you
> >> > > > > are leaving a lot of network throughput unused during
> recoveries.
> >> > > > >
> >> > > > > At the same time AWS doesn't share details for the burst
> >> capacity, so
> >> > > > > it's
> >> > > > > sometimes
> >> > > > > tricky to configure the whole system properly. I don't have an
> >> > universal
> >> > > > > good answer
> >> > > > > for that :(
> >> > > > >
> >> > > > > Best,
> >> > > > > Piotrek
> >> > > > >
> >> > > > > wt., 21 paź 2025 o 21:40 Samrat Deb [email protected]
> >> > napisał(a):
> >> > > > >
> >> > > > > > Hi Gabor/ Ferenc
> >> > > > > >
> >> > > > > > Thank you for sharing the pointer and valuable feedback.
> >> > > > > >
> >> > > > > > The link to the custom `XmlResponsesSaxParser`[1] looks scary
> 😦
> >> > > > > > and contains hidden complexity.
> >> > > > > >
> >> > > > > > 1. Could you share some context on why this custom parser was
> >> > > > > > necessary?
> >> > > > > > Was it to work around a specific bug, a performance issue, or
> an
> >> > > > > > inconsistency in the S3 XML API responses that the default AWS
> >> SDK
> >> > > > > > parser
> >> > > > > > couldn't handle at the time? With sdk v2 what are core
> >> > functionality
> >> > > > > > that
> >> > > > > > is required to be intensively tested ?
> >> > > > > >
> >> > > > > > 2. You mentioned it has no Hadoop dependency, which is great
> >> news.
> >> > > > > > For
> >> > > > > > a
> >> > > > > > new native S3 connector, would integration simply require
> >> > implementing
> >> > > > > > a
> >> > > > > > new S3DelegationTokenProvider/Receiver pair using the AWS SDK,
> >> or
> >> > are
> >> > > > > > there
> >> > > > > > more subtle integration points with the framework that should
> be
> >> > > > > > accounted?
> >> > > > > >
> >> > > > > > 3. I remember solving Serialized Throwable exception issue [2]
> >> > > > > > leading
> >> > > > > > to
> >> > > > > > a new bug [3], where an initial fix led to a regression that
> >> Gabor
> >> > > > > > later
> >> > > > > > solved with Ferenc providing a detailed root cause insights
> [4]
> >> 😅.
> >> > > > > > Its hard to fully sure that all scenarios are covered
> properly.
> >> > This is
> >> > > > > > one
> >> > > > > > of the example, there can be other unknowns.
> >> > > > > > what would be the best approach to test for and prevent such
> >> > > > > > regressions
> >> > > > > > or
> >> > > > > > unknown unknowns, especially in the most sensitive parts of
> the
> >> > > > > > filesystem
> >> > > > > > logic?
> >> > > > > >
> >> > > > > > Cheers,
> >> > > > > > Samrat
> >> > > > > >
> >> > > > > > [1]
> >> > > >
> >> > > >
> >> >
> >>
> https://github.com/apache/flink/blob/0e4e6d7082e83f098d0c1a94351babb3ea407aa8/flink-filesystems/flink-s3-fs-base/src/main/java/com/amazonaws/services/s3/model/transform/XmlResponsesSaxParser.java
> >> > > >
> >> > > > > > [2] https://issues.apache.org/jira/browse/FLINK-28513
> >> > > > > > [3] https://github.com/apache/flink/pull/25231
> >> > > > > > [4]
> >> > https://github.com/apache/flink/pull/25231#issuecomment-2312059662
> >> > > > > >
> >> > > > > > On Tue, 21 Oct 2025 at 3:49 PM, Gabor Somogyi <
> >> > > > > > [email protected]
> >> > > > > >
> >> > > > > > wrote:
> >> > > > > >
> >> > > > > > > Hi Samrat,
> >> > > > > > >
> >> > > > > > > +1 on the direction that we move away from hadoop.
> >> > > > > > >
> >> > > > > > > This is a long standing discussion to replace the mentioned
> 2
> >> > > > > > > connectors
> >> > > > > > > with something better.
> >> > > > > > > Both of them has it's own weaknesses, I've fixed several
> >> blockers
> >> > > > > > > inside
> >> > > > > > > them.
> >> > > > > > >
> >> > > > > > > There are definitely magic inside them, please see this [1]
> >> for
> >> > > > > > > example
> >> > > > > > > and
> >> > > > > > > there are more🙂
> >> > > > > > > I think the most sensitive part is the recovery because hard
> >> to
> >> > test
> >> > > > > > > all
> >> > > > > > > cases.
> >> > > > > > >
> >> > > > > > > @Ferenc
> >> > > > > > >
> >> > > > > > > > One thing that comes to my mind that will need some
> changes
> >> > and its
> >> > > > > > > > involvement
> >> > > > > > > > to this change is not trivial is the delegation token
> >> > framework.
> >> > > > > > > > Currently
> >> > > > > > > > it
> >> > > > > > > > is also tied to the Hadoop stuff and has some abstract
> >> classes
> >> > in the
> >> > > > > > > > base
> >> > > > > > > > S3 FS
> >> > > > > > > > module.
> >> > > > > > >
> >> > > > > > > The delegation token framework has no dependency on hadoop
> so
> >> > there
> >> > > > > > > is
> >> > > > > > > no
> >> > > > > > > blocker on the road,
> >> > > > > > > but I'm here to help if any question appears.
> >> > > > > > >
> >> > > > > > > BR,
> >> > > > > > > G
> >> > > > > > >
> >> > > > > > > [1]
> >> > > >
> >> > > >
> >> >
> >>
> https://github.com/apache/flink/blob/0e4e6d7082e83f098d0c1a94351babb3ea407aa8/flink-filesystems/flink-s3-fs-base/src/main/java/com/amazonaws/services/s3/model/transform/XmlResponsesSaxParser.java#L95-L104
> >> > > >
> >> > > > > > > On Tue, Oct 14, 2025 at 8:19 PM Samrat Deb
> >> [email protected]
> >> > > > > > > wrote:
> >> > > > > > >
> >> > > > > > > > Hi All,
> >> > > > > > > >
> >> > > > > > > > Poorvank (cc'ed) and I are writing to start a discussion
> >> about
> >> > a
> >> > > > > > > > potential
> >> > > > > > > > improvement for Flink, creating a new, native S3
> filesystem
> >> > > > > > > > independent
> >> > > > > > > > of
> >> > > > > > > > Hadoop/Presto.
> >> > > > > > > >
> >> > > > > > > > The goal of this proposal is to address several challenges
> >> > related
> >> > > > > > > > to
> >> > > > > > > > Flink's S3 integration, simplifying flink-s3-filesystem.
> If
> >> > this
> >> > > > > > > > discussion
> >> > > > > > > > gains positive traction, the next step would be to move
> >> forward
> >> > > > > > > > with
> >> > > > > > > > a
> >> > > > > > > > formalised FLIP.
> >> > > > > > > >
> >> > > > > > > > The Challenges with the Current S3 Connectors
> >> > > > > > > > Currently, Flink offers two primary S3 filesystems,
> >> > > > > > > > flink-s3-fs-hadoop[1]
> >> > > > > > > > and flink-s3-fs-presto[2]. While functional, this
> >> > dual-connector
> >> > > > > > > > approach
> >> > > > > > > > has few issues:
> >> > > > > > > >
> >> > > > > > > > 1. The flink-s3-fs-hadoop connector adds an additional
> >> > dependency
> >> > > > > > > > to
> >> > > > > > > > manage. Upgrades like AWS SDK v2 are more dependent on
> >> > > > > > > > Hadoop/Presto
> >> > > > > > > > to
> >> > > > > > > > support first and leverage in flink-s3-filesystem.
> Sometimes
> >> > it's
> >> > > > > > > > restrictive to leverage features directly from the AWS
> SDK.
> >> > > > > > > >
> >> > > > > > > > 2. The flink-s3-fs-presto connector was introduced to
> >> mitigate
> >> > the
> >> > > > > > > > performance issues of the Hadoop connector, especially for
> >> > > > > > > > checkpointing.
> >> > > > > > > > However, it lacks a RecoverableWriter implementation.
> >> > > > > > > > Sometimes it's confusing for Flink users, highlighting the
> >> need
> >> > > > > > > > for a
> >> > > > > > > > single, unified solution.
> >> > > > > > > >
> >> > > > > > > > Proposed Solution:
> >> > > > > > > > A Native, Hadoop-Free S3 Filesystem
> >> > > > > > > >
> >> > > > > > > > I propose we develop a new filesystem, let's call it
> >> > > > > > > > flink-s3-fs-native,
> >> > > > > > > > built directly on the modern AWS SDK for Java v2. This
> >> approach
> >> > > > > > > > would
> >> > > > > > > > be
> >> > > > > > > > free of any Hadoop or Presto dependencies. I have done a
> >> small
> >> > > > > > > > prototype
> >> > > > > > > > to
> >> > > > > > > > validate [3]
> >> > > > > > > >
> >> > > > > > > > This is motivated by trino<>s3 [4]. The Trino project
> >> > successfully
> >> > > > > > > > undertook a similar migration, moving from Hadoop-based
> >> object
> >> > > > > > > > storage
> >> > > > > > > > clients to their own native implementations.
> >> > > > > > > >
> >> > > > > > > > The new Flink S3 filesystem would:
> >> > > > > > > >
> >> > > > > > > > 1. Provide a single, unified connector for all S3
> >> interactions,
> >> > > > > > > > from
> >> > > > > > > > state
> >> > > > > > > > backends to sinks.
> >> > > > > > > >
> >> > > > > > > > 2. Implement a high-performance S3RecoverableWriter using
> >> S3's
> >> > > > > > > > Multipart
> >> > > > > > > > Upload feature, ensuring exactly-once sink semantics.
> >> > > > > > > >
> >> > > > > > > > 3. Offer a clean, self-contained dependency, drastically
> >> > > > > > > > simplifying
> >> > > > > > > > setup
> >> > > > > > > > and eliminating external dependencies.
> >> > > > > > > >
> >> > > > > > > > A Phased Migration Path
> >> > > > > > > > To ensure a smooth transition, we could adopt a phased
> >> > approach on
> >> > > > > > > > a
> >> > > > > > > > very
> >> > > > > > > > high level :
> >> > > > > > > >
> >> > > > > > > > Phase 1:
> >> > > > > > > > Introduce the new native S3 filesystem as an optional,
> >> parallel
> >> > > > > > > > plugin.
> >> > > > > > > > This would allow for community testing and adoption
> without
> >> > > > > > > > breaking
> >> > > > > > > > existing setups.
> >> > > > > > > >
> >> > > > > > > > Phase 2:
> >> > > > > > > > Once the native connector achieves feature parity and
> proven
> >> > > > > > > > stability,
> >> > > > > > > > we
> >> > > > > > > > will update the documentation to recommend it as the
> default
> >> > choice
> >> > > > > > > > for
> >> > > > > > > > all
> >> > > > > > > > S3 use cases.
> >> > > > > > > >
> >> > > > > > > > Phase 3:
> >> > > > > > > > In a future major release, the legacy flink-s3-fs-hadoop
> and
> >> > > > > > > > flink-s3-fs-presto connectors could be formally
> deprecated,
> >> > with
> >> > > > > > > > clear
> >> > > > > > > > migration guides provided for users.
> >> > > > > > > >
> >> > > > > > > > I would love to hear the community's thoughts on this.
> >> > > > > > > >
> >> > > > > > > > A few questions to start the discussion:
> >> > > > > > > >
> >> > > > > > > > 1. What are the biggest pain points with the current S3
> >> > filesystem?
> >> > > > > > > >
> >> > > > > > > > 2. Are there any critical features from the Hadoop S3A
> >> client
> >> > that
> >> > > > > > > > are
> >> > > > > > > > essential to replicate in a native implementation?
> >> > > > > > > >
> >> > > > > > > > 3. Would a simplified, non-dependent S3 experience be a
> >> > valuable
> >> > > > > > > > improvement for Flink use cases?
> >> > > > > > > >
> >> > > > > > > > Cheers,
> >> > > > > > > > Samrat
> >> > > > > > > >
> >> > > > > > > > [1]
> >> > > >
> >> > > >
> >> >
> >>
> https://github.com/apache/flink/tree/master/flink-filesystems/flink-s3-fs-hadoop
> >> > > >
> >> > > > > > > > [2]
> >> > > >
> >> > > >
> >> >
> >>
> https://github.com/apache/flink/tree/master/flink-filesystems/flink-s3-fs-presto
> >> > > >
> >> > > > > > > > [3] https://github.com/Samrat002/flink/pull/4
> >> > > > > > > > [4]
> >> > > > > > > >
> >> > https://github.com/trinodb/trino/tree/master/lib/trino-filesystem-s3
> >> >
> >>
> >
>

Re: [DISCUSSION] Native S3 Filesystem in Apache Flink

Reply via email to