Hi Alex, Thank you for the detailed feedback.
Please find my responses to your suggestions and questions below: > 1. Could we create a "Public interfaces" section? At the moment, proposed interfaces are to be found in multiple parts of the doc and it makes it harder to get general direction. I have consolidated the public interfaces into a dedicated section in the FLIP. This includes the NativeS3FileSystem, the NativeS3RecoverableWriter, and the internal but critical S3AccessHelper which abstracts our SDK interactions. > 2. Current PoC implementation contains more configurations than is outlined in the FLIP, I understand that this part will be evolving and it would be good to have a general review of public contracts as part of the FLIP. You are correct; the PoC had diverged slightly. I have updated the FLIP with an exhaustive list of all public configuration keys. We are standardizing on the s3. prefix (e.g., s3.access-key, s3.upload.min-part-size) to provide a clean break from the legacy presto.s3. and s3a. namespaces. > 3. Could we call out our testing strategy on the path to production readiness? Will we mark configurations that enable this feature @experimental? What would be our acceptance criteria to consider it production ready? We will mark the initial release of the native filesystem as @Experimental. Acceptance criteria for "Production Ready" status will include: a) Full functional parity with legacy connectors (including PathsCopyingFileSystem support). b) Zero reported data integrity issues over a full release cycle. c) Stable memory profiles during high-concurrency recovery. > 4. I assume we imply full state compatibility during migration through "load with legacy hadoop, then write using new new fs", should we expand on migration strategy to ensure that we have a clear path forward? For example, would migration involve setting up both schemas (s3a with legacy as recovery path + s3 with new FS as checkpoint path) and packaging both implementations in the `plugins` directory to perform transition? Currently, Flink typically loads one S3 plugin at a time. However, flink-s3-fs-native is designed to support multiple schemes (including s3:// and s3a://) in the PoC to facilitate easier cut-overs. Migration involves switching the plugin JAR; since the native connector respects the standard S3 object format, it can "Write-Forward" from existing state. To execute the migration for any existing Job, an engineer only needs to perform a "JAR-swap" in the Flink plugins/ directory: 1. Delete the legacy connector (e.g., flink-s3-fs-hadoop-*.jar). 2. Add the flink-s3-fs-native-*.jar. 3. The native connector has an eye-to-eye configuration, which the user needs to update (like presto.s3.* or s3a.*) to its own internal settings, ensuring that credentials and endpoints are correctly inherited. > 5. CRT support is called out in the FLIP, but doesn't seem to be a part of PoC implementation, are we going to add it as a follow up? While the initial implementation prioritizes the Netty-based asynchronous client for stability, the architecture is ready for the AWS CRT client via the S3TransferManager. I plan to add this as a high-priority follow-up once the core logic is merged. > 6. It looks like PoC implementation already supports server side encryption for SSE-KMS, so it would be great to call this out in the FLIP. At a glance adding support for other SSE approaches (like SSE-C and Client side encryption) is not that straightforward with PoC implementation as SSE-KMS. Is it worth considering it as a child FLIP for prod migration? I’ve called out the existing SSE-KMS support in the FLIP. I agree that SSE-C and client-side encryption are more complex; I have marked those as "Phase 2" tasks or potential child FLIPs to avoid blocking the primary release. > 7. This FLIP suggests that we want to replace Flink dependency on hadoop/presto. Are we considering having some "uber" FLIP covering implementation for Azure/GCP as well? The current goal is specifically to resolve the S3 "Jar Hell" and functional gaps (like the missing RecoverableWriter in Presto). I’m very open to collaborating on an "uber" FLIP for other cloud providers once we establish the pattern here. > 8. FLIP suggests that we can significantly decrease packaged JAR size, could we provide guidance on the size of package SDK native FS with shaded dependencies to strengthen this selling point? My latest builds show the legacy Hadoop-based JAR at 30MB, while the native shaded JAR is approximately 13MB. Reduction of over 50%. ``` 30M Dec 7 02:29 flink-s3-fs-hadoop-2.3-SNAPSHOT.jar ``` ``` 13M Feb 11 17:52 flink-s3-fs-native-2.3-SNAPSHOT.jar ``` i have updated the FLIP with the required details. PTAL Cheers, Samrat On Sat, Feb 14, 2026 at 1:10 AM Samrat Deb <[email protected]> wrote: > Thank you, Aleksandr Iushmanov, for reviewing the proposal. > > Please find the responses below to high level questions > > > 1. We discuss that multipart upload has a minimum part size of 5 MB, > does it mean that we are limited to "commit" less than 5 MB of data? > Would it mean that users with low traffic would have large end to end > latency or is it still possible to "commit" the data on checkpoint and > restart multipart upload? > > Users with low traffic are not penalised with high latency. Although S3 > requires 5MB, which is just a default number set for a multipart "part," > Flink handles the "tail" (data < 5MB) by serializing the raw bytes directly > into the checkpoint state (the CommitRecoverable object). This ensures > the checkpoint is durable from the start. > > > 2. To gain trust in this new file system, we need extensive testing of > failover/recovery and ensure it doesn't lead to data loss / object > leaks / memory leaks etc. Have we already covered some of the basic > durability testing as part of PoC or is it a part of the testing plan? > > Durability has been a primary focus of the PoC. > a) The RecoverableMultiPartUploadImplTest uses a StubMultiPartUploader to > simulate network failures during part uploads. > b) The test verifies that the S3Recoverable state correctly captures > ETags and part numbers, ensuring that a recovery attempt correctly > identifies which parts are already on S3 and which must be re-uploaded from > the local buffer. > c) For Object Leak Prevention, The S3Committer includes logic to check > whether a file was already committed during recovery, preventing duplicate > object creation or orphaned MPUs. > d) I conducted thorough internal testing with a large state and > approximately 140 TB of data written in streaming mode to S3 using the > flink-native-s3-fs. No anomalies or data integrity issues were discovered. > > > > On Mon, Feb 9, 2026 at 11:07 PM Aleksandr Iushmanov <[email protected]> > wrote: > >> Hi Samrat, >> >> Thank you for putting a very detailed FLIP together! >> >> I have a few suggestions to strengthen the proposal: >> 1. Could we create a "Public interfaces" section? At the moment, >> proposed interfaces are to be found in multiple parts of the doc and >> it makes it harder to get general direction. >> 2. Current PoC implementation contains more configurations than is >> outlined in the FLIP, I understand that this part will be evolving and >> it would be good to have a general review of public contracts as part >> of the FLIP. >> 3. Could we call out our testing strategy on the path to production >> readiness? Will we mark configurations that enable this feature >> @experimental? What would be our acceptance criteria to consider it >> production ready? >> 4. I assume we imply full state compatibility during migration through >> "load with legacy hadoop, then write using new new fs", should we >> expand on migration strategy to ensure that we have a clear path >> forward? For example, would migration involve setting up both schemas >> (s3a with legacy as recovery path + s3 with new FS as checkpoint path) >> and packaging both implementations in the `plugins` directory to >> perform transition? >> 5. CRT support is called out in the FLIP, but doesn't seem to be a >> part of PoC implementation, are we going to add it as a follow up? >> 6. It looks like PoC implementation already supports server side >> encryption for SSE-KMS, so it would be great to call this out in the >> FLIP. At a glance adding support for other SSE approaches (like SSE-C >> and Client side encryption) is not that straightforward with PoC >> implementation as SSE-KMS. Is it worth considering it as a child FLIP >> for prod migration? >> 7. This FLIP suggests that we want to replace Flink dependency on >> hadoop/presto. Are we considering having some "uber" FLIP covering >> implementation for Azure/GCP as well? >> 8. FLIP suggests that we can significantly decrease packaged JAR size, >> could we provide guidance on the size of package SDK native FS with >> shaded dependencies to strengthen this selling point? >> >> I also have are a couple of high level questions: >> >> 1. We discuss that multipart upload has a minimum part size of 5 MB, >> does it mean that we are limited to "commit" less than 5 MB of data? >> Would it mean that users with low traffic would have large end to end >> latency or is it still possible to "commit" the data on checkpoint and >> restart multipart upload? >> >> 2. To gain trust in this new file system, we need extensive testing of >> failover/recovery and ensure it doesn't lead to data loss / object >> leaks / memory leaks etc. Have we already covered some of the basic >> durability testing as part of PoC or is it a part of the testing plan? >> >> Kind regards, >> Alex >> >> On Fri, 6 Feb 2026 at 09:17, Samrat Deb <[email protected]> wrote: >> > >> > Hi everyone, >> > >> > Following up on our earlier Thread[1] regarding the architectural >> > fragmentation of S3 support, I would like to formally present the >> progress >> > on introducing a native S3 filesystem for Flink. >> > >> > The current "dual-connector" ecosystem—split between flink-s3-fs-hadoop >> and >> > flink-s3-fs-presto—has reached its technical limits. The Hadoop-based >> > implementation introduces significant dependency bloat and persistent >> > classpath conflicts, while the Presto-based connector lacks >> > RecoverableWriter forcing users to manage multiple configurations for >> > exactly-once sinks. >> > >> > To resolve this, I am proposing FLIP-555: Flink Native S3 FileSystem[2]. >> > This implementation is built directly on the AWS SDK for Java v2, >> providing >> > a unified, high-performance, and Hadoop-free solution for all S3 >> > interactions. >> > >> > I have conducted benchmarking comparing the native implementation >> against >> > the existing Presto-based filesystem. The initial results are highly >> > motivating, with a visible performance gain. You can find the detailed >> > performance analysis here[3] >> > >> > Following offline discussions with Piotr Nowojski and Gabor Somogyi, the >> > POC and benchmarking results are good enough to validate that Native S3 >> > FileSystem would be a valuable addition to Flink. >> > >> > With the addition of the Native S3 FileSystem, I have also discussed >> > briefly the Deprecation Strategy to ensure operational stability in the >> > FLIP. >> > >> > >> > 1. >> > >> > Phase 1: Introduce flink-s3-fs-native as an optional plugin for >> > community validation. >> > 2. >> > >> > Phase 2: Promote the native connector to the recommended default once >> > feature parity and stability are proven. >> > 3. >> > >> > Phase 3: Formally deprecate the legacy Hadoop and Presto connectors >> in a >> > future major release. >> > >> > Looking forward to your feedback and suggestions on the design and >> > implementation details outlined in the FLIP. >> > >> > >> > Cheers, >> > Samrat >> > >> > >> > [1] https://lists.apache.org/thread/2bllhqlbv0pz6t95tsjbszpm9bp9911c >> > >> > [2] >> > >> https://cwiki.apache.org/confluence/display/FLINK/FLIP-555%3A+Flink+Native+S3+FileSystem >> > >> > [3] >> > >> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=406620396 >> >
