Re: [DISCUSS][FLIP-555] Introduce Native S3 FileSystem for Apache Flink

Samrat Deb Fri, 13 Feb 2026 21:03:06 -0800

Hi Alex,

Thank you for the detailed feedback.


Please find my responses to your suggestions and questions below:

> 1. Could we create a "Public interfaces" section? At the moment,
proposed interfaces are to be found in multiple parts of the doc and
it makes it harder to get general direction.

 I have consolidated the public interfaces into a dedicated section in the
FLIP. This includes the NativeS3FileSystem, the NativeS3RecoverableWriter,
and the internal but critical S3AccessHelper which abstracts our SDK
interactions.

> 2. Current PoC implementation contains more configurations than is
outlined in the FLIP, I understand that this part will be evolving and
it would be good to have a general review of public contracts as part
of the FLIP.

You are correct; the PoC had diverged slightly. I have updated the FLIP
with an exhaustive list of all public configuration keys. We are
standardizing on the s3. prefix (e.g., s3.access-key,
s3.upload.min-part-size) to provide a clean break from the legacy presto.s3.
and s3a. namespaces.

> 3. Could we call out our testing strategy on the path to production
readiness? Will we mark configurations that enable this feature
@experimental? What would be our acceptance criteria to consider it
production ready?

We will mark the initial release of the native filesystem as @Experimental.
Acceptance criteria for "Production Ready" status will include:
a) Full functional parity with legacy connectors (including
PathsCopyingFileSystem support).
b) Zero reported data integrity issues over a full release cycle.
c) Stable memory profiles during high-concurrency recovery.

> 4. I assume we imply full state compatibility during migration through
"load with legacy hadoop, then write using new new fs", should we
expand on migration strategy to ensure that we have a clear path
forward? For example, would migration involve setting up both schemas
(s3a with legacy as recovery path + s3 with new FS as checkpoint path)
and packaging both implementations in the `plugins` directory to
perform transition?

Currently, Flink typically loads one S3 plugin at a time. However,
flink-s3-fs-native is designed to support multiple schemes (including s3://
and s3a://) in the PoC to facilitate easier cut-overs. Migration involves
switching the plugin JAR; since the native connector respects the standard
S3 object format, it can "Write-Forward" from existing state.

To execute the migration for any existing Job, an engineer only needs to
perform a "JAR-swap" in the Flink plugins/ directory:

   1.

   Delete the legacy connector (e.g., flink-s3-fs-hadoop-*.jar).

   2.

   Add the flink-s3-fs-native-*.jar.

   3.

   The native connector has an eye-to-eye configuration, which
   the user needs to update (like presto.s3.* or s3a.*) to its own internal
   settings, ensuring that credentials and endpoints are correctly inherited.


> 5. CRT support is called out in the FLIP, but doesn't seem to be a
part of PoC implementation, are we going to add it as a follow up?

While the initial implementation prioritizes the Netty-based asynchronous
client for stability, the architecture is ready for the AWS CRT client via
the S3TransferManager. I plan to add this as a high-priority follow-up once
the core logic is merged.

> 6. It looks like PoC implementation already supports server side
encryption for SSE-KMS, so it would be great to call this out in the
FLIP. At a glance adding support for other SSE approaches (like SSE-C
and Client side encryption) is not that straightforward with PoC
implementation as SSE-KMS. Is it worth considering it as a child FLIP
for prod migration?

I’ve called out the existing SSE-KMS support in the FLIP. I agree that
SSE-C and client-side encryption are more complex; I have marked those as
"Phase 2" tasks or potential child FLIPs to avoid blocking the primary
release.

> 7. This FLIP suggests that we want to replace Flink dependency on
hadoop/presto. Are we considering having some "uber" FLIP covering
implementation for Azure/GCP as well?

The current goal is specifically to resolve the S3 "Jar Hell" and
functional gaps (like the missing RecoverableWriter in Presto). I’m very
open to collaborating on an "uber" FLIP for other cloud providers once we
establish the pattern here.

> 8. FLIP suggests that we can significantly decrease packaged JAR size,
could we provide guidance on the size of package SDK native FS with
shaded dependencies to strengthen this selling point?

My latest builds show the legacy Hadoop-based JAR at 30MB, while the native
shaded JAR is approximately 13MB. Reduction of over 50%.

```
 30M Dec  7 02:29 flink-s3-fs-hadoop-2.3-SNAPSHOT.jar
```

```
 13M Feb 11 17:52 flink-s3-fs-native-2.3-SNAPSHOT.jar
```
i have updated the FLIP with the required details. PTAL


Cheers,
Samrat

On Sat, Feb 14, 2026 at 1:10 AM Samrat Deb <[email protected]> wrote:

> Thank you, Aleksandr Iushmanov, for reviewing the proposal.
>
> Please find the responses below to high level questions
>
> > 1. We discuss that multipart upload has a minimum part size of 5 MB,
> does it mean that we are limited to "commit" less than 5 MB of data?
> Would it mean that users with low traffic would have large end to end
> latency or is it still possible to "commit" the data on checkpoint and
> restart multipart upload?
>
> Users with low traffic are not penalised with high latency. Although S3
> requires 5MB, which is just a default number set for a multipart "part,"
> Flink handles the "tail" (data < 5MB) by serializing the raw bytes directly
> into the checkpoint state (the CommitRecoverable object). This ensures
> the checkpoint is durable from the start.
>
> > 2. To gain trust in this new file system, we need extensive testing of
> failover/recovery and ensure it doesn't lead to data loss / object
> leaks / memory leaks etc. Have we already covered some of the basic
> durability testing as part of PoC or is it a part of the testing plan?
>
> Durability has been a primary focus of the PoC.
> a) The RecoverableMultiPartUploadImplTest uses a StubMultiPartUploader to
> simulate network failures during part uploads.
> b) The test verifies that the S3Recoverable state correctly captures
> ETags and part numbers, ensuring that a recovery attempt correctly
> identifies which parts are already on S3 and which must be re-uploaded from
> the local buffer.
> c) For Object Leak Prevention, The S3Committer includes logic to check
> whether a file was already committed during recovery, preventing duplicate
> object creation or orphaned MPUs.
> d) I conducted thorough internal testing with a large state and
> approximately 140 TB of data written in streaming mode to S3 using the
> flink-native-s3-fs. No anomalies or data integrity issues were discovered.
>
>
>
> On Mon, Feb 9, 2026 at 11:07 PM Aleksandr Iushmanov <[email protected]>
> wrote:
>
>> Hi Samrat,
>>
>> Thank you for putting a very detailed FLIP together!
>>
>> I have a few suggestions to strengthen the proposal:
>> 1. Could we create a "Public interfaces" section? At the moment,
>> proposed interfaces are to be found in multiple parts of the doc and
>> it makes it harder to get general direction.
>> 2. Current PoC implementation contains more configurations than is
>> outlined in the FLIP, I understand that this part will be evolving and
>> it would be good to have a general review of public contracts as part
>> of the FLIP.
>> 3. Could we call out our testing strategy on the path to production
>> readiness? Will we mark configurations that enable this feature
>> @experimental? What would be our acceptance criteria to consider it
>> production ready?
>> 4. I assume we imply full state compatibility during migration through
>> "load with legacy hadoop, then write using new new fs", should we
>> expand on migration strategy to ensure that we have a clear path
>> forward? For example, would migration involve setting up both schemas
>> (s3a with legacy as recovery path + s3 with new FS as checkpoint path)
>> and packaging both implementations in the `plugins` directory to
>> perform transition?
>> 5. CRT support is called out in the FLIP, but doesn't seem to be a
>> part of PoC implementation, are we going to add it as a follow up?
>> 6. It looks like PoC implementation already supports server side
>> encryption for SSE-KMS, so it would be great to call this out in the
>> FLIP. At a glance adding support for other SSE approaches (like SSE-C
>> and Client side encryption) is not that straightforward with PoC
>> implementation as SSE-KMS. Is it worth considering it as a child FLIP
>> for prod migration?
>> 7. This FLIP suggests that we want to replace Flink dependency on
>> hadoop/presto. Are we considering having some "uber" FLIP covering
>> implementation for Azure/GCP as well?
>> 8. FLIP suggests that we can significantly decrease packaged JAR size,
>> could we provide guidance on the size of package SDK native FS with
>> shaded dependencies to strengthen this selling point?
>>
>> I also have are a couple of high level questions:
>>
>> 1. We discuss that multipart upload has a minimum part size of 5 MB,
>> does it mean that we are limited to "commit" less than 5 MB of data?
>> Would it mean that users with low traffic would have large end to end
>> latency or is it still possible to "commit" the data on checkpoint and
>> restart multipart upload?
>>
>> 2. To gain trust in this new file system, we need extensive testing of
>> failover/recovery and ensure it doesn't lead to data loss / object
>> leaks / memory leaks etc. Have we already covered some of the basic
>> durability testing as part of PoC or is it a part of the testing plan?
>>
>> Kind regards,
>> Alex
>>
>> On Fri, 6 Feb 2026 at 09:17, Samrat Deb <[email protected]> wrote:
>> >
>> > Hi everyone,
>> >
>> > Following up on our earlier Thread[1] regarding the architectural
>> > fragmentation of S3 support, I would like to formally present the
>> progress
>> > on introducing a native S3 filesystem for Flink.
>> >
>> > The current "dual-connector" ecosystem—split between flink-s3-fs-hadoop
>> and
>> > flink-s3-fs-presto—has reached its technical limits. The Hadoop-based
>> > implementation introduces significant dependency bloat and persistent
>> > classpath conflicts, while the Presto-based connector lacks
>> > RecoverableWriter forcing users to manage multiple configurations for
>> > exactly-once sinks.
>> >
>> > To resolve this, I am proposing FLIP-555: Flink Native S3 FileSystem[2].
>> > This implementation is built directly on the AWS SDK for Java v2,
>> providing
>> > a unified, high-performance, and Hadoop-free solution for all S3
>> > interactions.
>> >
>> > I have conducted benchmarking comparing the native implementation
>> against
>> > the existing Presto-based filesystem. The initial results are highly
>> > motivating, with a visible performance gain. You can find the detailed
>> > performance analysis here[3]
>> >
>> > Following offline discussions with Piotr Nowojski and Gabor Somogyi, the
>> > POC and benchmarking results are good enough to validate that Native S3
>> > FileSystem would be a valuable addition to Flink.
>> >
>> > With the addition of the Native S3 FileSystem, I have also discussed
>> > briefly the Deprecation Strategy to ensure operational stability in the
>> > FLIP.
>> >
>> >
>> >    1.
>> >
>> >    Phase 1: Introduce flink-s3-fs-native as an optional plugin for
>> >    community validation.
>> >    2.
>> >
>> >    Phase 2: Promote the native connector to the recommended default once
>> >    feature parity and stability are proven.
>> >    3.
>> >
>> >    Phase 3: Formally deprecate the legacy Hadoop and Presto connectors
>> in a
>> >    future major release.
>> >
>> > Looking forward to your feedback and suggestions on the design and
>> > implementation details outlined in the FLIP.
>> >
>> >
>> > Cheers,
>> > Samrat
>> >
>> >
>> > [1] https://lists.apache.org/thread/2bllhqlbv0pz6t95tsjbszpm9bp9911c
>> >
>> > [2]
>> >
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-555%3A+Flink+Native+S3+FileSystem
>> >
>> > [3]
>> >
>> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=406620396
>>
>

Re: [DISCUSS][FLIP-555] Introduce Native S3 FileSystem for Apache Flink

Reply via email to