Hi everyone, I'd like to start a discussion on FLIP-597: Hadoop-less Flink Filesystems [1].
As FLIP-555 [2] established, Flink's Hadoop-based filesystem plugins carry significant maintenance burdens: transitive dependency conflicts, classpath issues, and CVE exposure. FLIP-555 addressed this for S3 by introducing a native SDK-based filesystem. This FLIP extends that effort as an umbrella for all cloud providers (Azure, GCS, and potentially others) with the end goal to deprecate Hadoop-based variants. The FLIP proposes common, cloud-agnostic I/O abstractions placed in flink-core that all SDK-native filesystem implementations share as a step towards Hadoop-less Flink deployments: - ObjectStorageInputStream / ObjectStorageOutputStream: thread-safe stream implementations. - InputStreamExtension / InputStreamOpener: composable read pipeline supporting buffering, decryption via decoration - WriteContext / ReadContext: metadata descriptors enabling features like client-side encryption without changing the FileSystem API There are a number of open questions for discussion: 1. Should a cloud-agnostic RecoverableWriter abstraction be part of this FLIP given that implementation across clouds may be too specific for common abstractions to be useful. 2. Should PathsCopyingFileSystem (bulk copy) be in scope? Transfer managers for clouds vary. And currently we only support this implementation in AWS. 3. Should this FLIP include an ObjectStorageOperations interface (analogous to FLIP-555's S3AccessHelper) for a generalized ObjectStorageFileSystem base, or should we defer premature generalisation of testing setup? Main motivation for this would be the ability to verify common FS parts with testing doubles with reduced reliance on other libraries like MinIO [3] (archived), LocalStack [4] (archived), Azurite [5] (no SDK V2 support) that demonstrates a decline in support. Looking forward to your feedback. Best regards, Aleksandr Iushmanov [1] https://cwiki.apache.org/confluence/x/9gDuGQ [2] https://cwiki.apache.org/confluence/x/uYqmFw [3] https://github.com/minio/minio [4] https://github.com/localstack/localstack [5] https://github.com/Azure/Azurite
