Piotr Nowojski created FLINK-17505:
--------------------------------------
Summary: Merge small files produced by StreamingFileSink
Key: FLINK-17505
URL: https://issues.apache.org/jira/browse/FLINK-17505
Project: Flink
Issue Type: Improvement
Components: Connectors / FileSystem
Affects Versions: 1.10.0
Reporter: Piotr Nowojski
This an alternative approach to FLINK-11499, to solve a problem of creating
many small files with bulk formats in StreamingFileSink (which have to be
rolled on checkpoint).
Merge based approach would require converting {{StreamingFileSink}} from a
sink, to an operator, that would be working exactly as it’s working right now,
with the same limitations (no support for arbitrary rolling policies for bulk
formats), followed by another operator that would be tasked with merging small
files in the background.
In the long term we probably would like to have both merge operator and write
ahead log solution (WAL described in FLINK-11499) as alternatives, as WAL would
behave better if small files are more common, and merge operator could behave
better if small files are rare (because of data skew for example).
--
This message was sent by Atlassian Jira
(v8.3.4#803005)