[ 
https://issues.apache.org/jira/browse/FLINK-17505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17443767#comment-17443767
 ] 

Oleksandr Nitavskyi commented on FLINK-17505:
---------------------------------------------

Another workaround for this problem is writing data with Iceberg and 
consolidating later with Iceberg capabilities. It is quite easy to set up since 
Flink supports Iceberg output. Also, it seems to be an industry-proof setup.

Probably this ticket can be closed

> Merge small files produced by StreamingFileSink
> -----------------------------------------------
>
>                 Key: FLINK-17505
>                 URL: https://issues.apache.org/jira/browse/FLINK-17505
>             Project: Flink
>          Issue Type: Improvement
>          Components: Connectors / FileSystem
>    Affects Versions: 1.10.0
>            Reporter: Piotr Nowojski
>            Priority: Not a Priority
>              Labels: auto-deprioritized-major, auto-deprioritized-minor
>
> This an alternative approach to FLINK-11499, to solve a problem of creating 
> many small files with bulk formats in StreamingFileSink (which have to be 
> rolled on checkpoint).
> Merge based approach would require converting {{StreamingFileSink}} from a 
> sink, to an operator, that would be working exactly as it’s working right 
> now, with the same limitations (no support for arbitrary rolling policies for 
> bulk formats), followed by another operator that would be tasked with merging 
> small files in the background. 
> In the long term we probably would like to have both merge operator and write 
> ahead log solution (WAL described in FLINK-11499) as alternatives, as WAL 
> would behave better if small files are more common, and merge operator could 
> behave better if small files are rare (because of data skew for example).



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to