Github user StephanEwen commented on the issue:
https://github.com/apache/flink/pull/6108
@kl0u please link the issue once you created it.
This is currently very early, in design discussions between @kl0u, me, and
@aljoscha.
The main points about the rewrite are
- Use Flink's FileSystem abstraction, to make it work with shaded S3,
swift, etc and give an easier interface
- Add a proper "ChunkedWriter" abstraction to the FileSystems, which
handles write, persist-on-checkpoint, and rollback-to-checkpoint in a
FileSystem specific way. For example, use truncate()/append() on POSIX and
HDFS, use MultiPartUploads on S3, ...
- Add support for gathering large chunks across checkpoints, to make
Parquet and ORC compression more effective.
---