[
https://issues.apache.org/jira/browse/FLINK-19345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17201330#comment-17201330
]
Jingsong Lee commented on FLINK-19345:
--------------------------------------
Hi [~kkl0u], thanks for you reply, I have discussed with Guowei about the
unified sink many times offline, and in the unified sink discussion, Guowei
also mentioned relevant design and considerations about file compaction.[1]
At present, the conclusion of unified sink is that Hive partition commit and
file compaction are not supported for now. I think maybe considering too much
scope on the unified sink can lead to overly complex designs.
[1]http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-143-Unified-Sink-API-td44602.html
> Introduce File streaming sink compaction
> ----------------------------------------
>
> Key: FLINK-19345
> URL: https://issues.apache.org/jira/browse/FLINK-19345
> Project: Flink
> Issue Type: New Feature
> Components: Table SQL / Runtime
> Reporter: Jingsong Lee
> Assignee: Jingsong Lee
> Priority: Major
> Fix For: 1.12.0
>
>
> Users often complain that many small files are written out. Small files will
> affect the performance of file reading and the DFS system, and even the
> stability of the DFS system.
> Target:
> * Compact all files generated by this job in a single checkpoint.
> * With compaction, Users can have smaller checkpoint interval, even to
> seconds.
> Document:
> https://docs.google.com/document/d/1cdlyoqgBq9yJEiHFBziimIoKHapQiEY2-0Tn8IF6G-c/edit?usp=sharing
--
This message was sent by Atlassian Jira
(v8.3.4#803005)