Hi everyone,

I'd like to start a discussion on FLIP-597: Hadoop-less Flink Filesystems
[1].

As FLIP-555 [2] established, Flink's Hadoop-based filesystem plugins
carry significant maintenance burdens: transitive dependency conflicts,
classpath issues, and CVE exposure. FLIP-555 addressed this for S3 by
introducing a native SDK-based filesystem. This FLIP extends that effort
as an umbrella for all cloud providers (Azure, GCS, and potentially
others) with the end goal to deprecate Hadoop-based variants.

The FLIP proposes common, cloud-agnostic I/O abstractions placed in
flink-core that all SDK-native filesystem implementations share as
a step towards Hadoop-less Flink deployments:

- ObjectStorageInputStream / ObjectStorageOutputStream: thread-safe
  stream implementations.
- InputStreamExtension / InputStreamOpener: composable read pipeline
  supporting buffering, decryption via decoration
- WriteContext / ReadContext: metadata descriptors enabling features
  like client-side encryption without changing the FileSystem API

There are a number of open questions for discussion:
1. Should a cloud-agnostic RecoverableWriter abstraction be part of
   this FLIP given that implementation across clouds may be too specific
   for common abstractions to be useful.
2. Should PathsCopyingFileSystem (bulk copy) be in scope? Transfer managers
   for clouds vary. And currently we only support this implementation in
AWS.
3. Should this FLIP include an ObjectStorageOperations interface
   (analogous to FLIP-555's S3AccessHelper) for a generalized
   ObjectStorageFileSystem base, or should we defer premature generalisation
   of testing setup? Main motivation for this would be the ability to verify
   common FS parts with testing doubles with reduced reliance on other
libraries
   like MinIO [3] (archived), LocalStack [4] (archived), Azurite [5] (no
SDK V2 support)
   that demonstrates a decline in support.

Looking forward to your feedback.

Best regards,
Aleksandr Iushmanov

[1] https://cwiki.apache.org/confluence/x/9gDuGQ
[2] https://cwiki.apache.org/confluence/x/uYqmFw
[3] https://github.com/minio/minio
[4] https://github.com/localstack/localstack
[5] https://github.com/Azure/Azurite

Reply via email to