Re: [DISCUSS][FLIP-555] Introduce Native S3 FileSystem for Apache Flink

Samrat Deb Fri, 13 Feb 2026 11:42:54 -0800

Thank you, Aleksandr Iushmanov, for reviewing the proposal.

Please find the responses below to high level questions


> 1. We discuss that multipart upload has a minimum part size of 5 MB,
does it mean that we are limited to "commit" less than 5 MB of data?
Would it mean that users with low traffic would have large end to end
latency or is it still possible to "commit" the data on checkpoint and
restart multipart upload?

Users with low traffic are not penalised with high latency. Although S3
requires 5MB, which is just a default number set for a multipart "part,"
Flink handles the "tail" (data < 5MB) by serializing the raw bytes directly
into the checkpoint state (the CommitRecoverable object). This ensures the
checkpoint is durable from the start.

> 2. To gain trust in this new file system, we need extensive testing of
failover/recovery and ensure it doesn't lead to data loss / object
leaks / memory leaks etc. Have we already covered some of the basic
durability testing as part of PoC or is it a part of the testing plan?

Durability has been a primary focus of the PoC.
a) The RecoverableMultiPartUploadImplTest uses a StubMultiPartUploader to
simulate network failures during part uploads.
b) The test verifies that the S3Recoverable state correctly captures ETags
and part numbers, ensuring that a recovery attempt correctly identifies
which parts are already on S3 and which must be re-uploaded from the local
buffer.
c) For Object Leak Prevention, The S3Committer includes logic to check
whether a file was already committed during recovery, preventing duplicate
object creation or orphaned MPUs.
d) I conducted thorough internal testing with a large state and
approximately 140 TB of data written in streaming mode to S3 using the
flink-native-s3-fs. No anomalies or data integrity issues were discovered.



On Mon, Feb 9, 2026 at 11:07 PM Aleksandr Iushmanov <[email protected]>
wrote:

> Hi Samrat,
>
> Thank you for putting a very detailed FLIP together!
>
> I have a few suggestions to strengthen the proposal:
> 1. Could we create a "Public interfaces" section? At the moment,
> proposed interfaces are to be found in multiple parts of the doc and
> it makes it harder to get general direction.
> 2. Current PoC implementation contains more configurations than is
> outlined in the FLIP, I understand that this part will be evolving and
> it would be good to have a general review of public contracts as part
> of the FLIP.
> 3. Could we call out our testing strategy on the path to production
> readiness? Will we mark configurations that enable this feature
> @experimental? What would be our acceptance criteria to consider it
> production ready?
> 4. I assume we imply full state compatibility during migration through
> "load with legacy hadoop, then write using new new fs", should we
> expand on migration strategy to ensure that we have a clear path
> forward? For example, would migration involve setting up both schemas
> (s3a with legacy as recovery path + s3 with new FS as checkpoint path)
> and packaging both implementations in the `plugins` directory to
> perform transition?
> 5. CRT support is called out in the FLIP, but doesn't seem to be a
> part of PoC implementation, are we going to add it as a follow up?
> 6. It looks like PoC implementation already supports server side
> encryption for SSE-KMS, so it would be great to call this out in the
> FLIP. At a glance adding support for other SSE approaches (like SSE-C
> and Client side encryption) is not that straightforward with PoC
> implementation as SSE-KMS. Is it worth considering it as a child FLIP
> for prod migration?
> 7. This FLIP suggests that we want to replace Flink dependency on
> hadoop/presto. Are we considering having some "uber" FLIP covering
> implementation for Azure/GCP as well?
> 8. FLIP suggests that we can significantly decrease packaged JAR size,
> could we provide guidance on the size of package SDK native FS with
> shaded dependencies to strengthen this selling point?
>
> I also have are a couple of high level questions:
>
> 1. We discuss that multipart upload has a minimum part size of 5 MB,
> does it mean that we are limited to "commit" less than 5 MB of data?
> Would it mean that users with low traffic would have large end to end
> latency or is it still possible to "commit" the data on checkpoint and
> restart multipart upload?
>
> 2. To gain trust in this new file system, we need extensive testing of
> failover/recovery and ensure it doesn't lead to data loss / object
> leaks / memory leaks etc. Have we already covered some of the basic
> durability testing as part of PoC or is it a part of the testing plan?
>
> Kind regards,
> Alex
>
> On Fri, 6 Feb 2026 at 09:17, Samrat Deb <[email protected]> wrote:
> >
> > Hi everyone,
> >
> > Following up on our earlier Thread[1] regarding the architectural
> > fragmentation of S3 support, I would like to formally present the
> progress
> > on introducing a native S3 filesystem for Flink.
> >
> > The current "dual-connector" ecosystem—split between flink-s3-fs-hadoop
> and
> > flink-s3-fs-presto—has reached its technical limits. The Hadoop-based
> > implementation introduces significant dependency bloat and persistent
> > classpath conflicts, while the Presto-based connector lacks
> > RecoverableWriter forcing users to manage multiple configurations for
> > exactly-once sinks.
> >
> > To resolve this, I am proposing FLIP-555: Flink Native S3 FileSystem[2].
> > This implementation is built directly on the AWS SDK for Java v2,
> providing
> > a unified, high-performance, and Hadoop-free solution for all S3
> > interactions.
> >
> > I have conducted benchmarking comparing the native implementation against
> > the existing Presto-based filesystem. The initial results are highly
> > motivating, with a visible performance gain. You can find the detailed
> > performance analysis here[3]
> >
> > Following offline discussions with Piotr Nowojski and Gabor Somogyi, the
> > POC and benchmarking results are good enough to validate that Native S3
> > FileSystem would be a valuable addition to Flink.
> >
> > With the addition of the Native S3 FileSystem, I have also discussed
> > briefly the Deprecation Strategy to ensure operational stability in the
> > FLIP.
> >
> >
> >    1.
> >
> >    Phase 1: Introduce flink-s3-fs-native as an optional plugin for
> >    community validation.
> >    2.
> >
> >    Phase 2: Promote the native connector to the recommended default once
> >    feature parity and stability are proven.
> >    3.
> >
> >    Phase 3: Formally deprecate the legacy Hadoop and Presto connectors
> in a
> >    future major release.
> >
> > Looking forward to your feedback and suggestions on the design and
> > implementation details outlined in the FLIP.
> >
> >
> > Cheers,
> > Samrat
> >
> >
> > [1] https://lists.apache.org/thread/2bllhqlbv0pz6t95tsjbszpm9bp9911c
> >
> > [2]
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-555%3A+Flink+Native+S3+FileSystem
> >
> > [3]
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=406620396
>

Re: [DISCUSS][FLIP-555] Introduce Native S3 FileSystem for Apache Flink

Reply via email to