Hi,

While following the ongoing work around adding AWS SDK v2 support to
flink-s3-fs-hadoop patch [1], I realised that Flink's S3 filesystem
landscape has evolved recently and now stands in a state where
wider discussion is required to drive the direction forward.

Today, Flink effectively has three S3 filesystem implementations:

   - flink-s3-fs-hadoop
   - flink-s3-fs-presto
   - native-s3-fs, introduced through FLIP-555[2] recently

With native-s3-fs already providing AWS SDK v2 support by default. It also
supports FileSystem source/sink along with RecoverableWriter for
checkpointing. As discussed in the thread[3], the goal is to deprecate
flink-s3-fs-hadoop and flink-s3-fs-presto moving forward.

 I'm unclear on how the overall S3 story is expected to evolve. Should the
community expect to maintain multiple S3 implementations long term, each
serving different use cases, or is there an eventual consolidation strategy
in mind?

I am not proposing any specific direction here. I am mainly looking to
understand the long-term vision so that ongoing efforts around S3 support
can be evaluated in that broader context.


[1] https://github.com/apache/flink/pull/27026
[2]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-555%3A+Flink+Native+S3+FileSystem
[3] https://lists.apache.org/thread/2bllhqlbv0pz6t95tsjbszpm9bp9911c

Best,
Samrat

Reply via email to