[jira] [Commented] (FLINK-19345) Introduce File streaming sink compaction

Jingsong Lee (Jira) Thu, 24 Sep 2020 00:37:29 -0700


    [ 
https://issues.apache.org/jira/browse/FLINK-19345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17201330#comment-17201330
 ]


Jingsong Lee commented on FLINK-19345:
--------------------------------------

Hi [~kkl0u], thanks for you reply, I have discussed with Guowei about the 
unified sink many times offline, and in the unified sink discussion, Guowei 
also mentioned relevant design and considerations about file compaction.[1]

At present, the conclusion of unified sink is that Hive partition commit and 
file compaction are not supported for now. I think maybe considering too much 
scope on the unified sink can lead to overly complex designs.

[1]http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-143-Unified-Sink-API-td44602.html

> Introduce File streaming sink compaction
> ----------------------------------------
>
>                 Key: FLINK-19345
>                 URL: https://issues.apache.org/jira/browse/FLINK-19345
>             Project: Flink
>          Issue Type: New Feature
>          Components: Table SQL / Runtime
>            Reporter: Jingsong Lee
>            Assignee: Jingsong Lee
>            Priority: Major
>             Fix For: 1.12.0
>
>
> Users often complain that many small files are written out. Small files will 
> affect the performance of file reading and the DFS system, and even the 
> stability of the DFS system.
> Target: 
>  * Compact all files generated by this job in a single checkpoint.
>  * With compaction, Users can have smaller checkpoint interval, even to 
> seconds.
> Document: 
> https://docs.google.com/document/d/1cdlyoqgBq9yJEiHFBziimIoKHapQiEY2-0Tn8IF6G-c/edit?usp=sharing



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (FLINK-19345) Introduce File streaming sink compaction

Reply via email to