[jira] [Closed] (FLINK-27696) Add bin-pack strategy to split the whole bucket data files into several small splits for append-only table.

Jingsong Lee (Jira) Mon, 20 Jun 2022 22:48:06 -0700


     [ 
https://issues.apache.org/jira/browse/FLINK-27696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Jingsong Lee closed FLINK-27696.
--------------------------------
    Resolution: Fixed

master: c80f600c149fc90f8522f43af67f0ab2a713b57f

> Add bin-pack strategy to split the whole bucket data files into several small 
> splits for append-only table.
> -----------------------------------------------------------------------------------------------------------
>
>                 Key: FLINK-27696
>                 URL: https://issues.apache.org/jira/browse/FLINK-27696
>             Project: Flink
>          Issue Type: Sub-task
>            Reporter: Zheng Hu
>            Assignee: Jingsong Lee
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: table-store-0.2.0
>
>
> We don't have to assign each task with a whole bucket data files. Instead, we 
> can use some algorithm ( such as bin-packing) to split the whole bucket data 
> files into multiple fragments to improve the job parallelism.
> For merge tree table:
> Suppose now there are files: [1, 2] [3, 4] [5, 180] [5, 190] [200, 600] [210, 
> 700]
> Files without intersection are not related, we do not need to put all files 
> into one split, we can slice into multiple splits, multiple parallelism 
> execution is faster. Nor can we slice too fine, we should make each split as 
> large as possible with 128 MB, so use BinPack to slice, the final result will 
> be:
>  * split1: [1, 2] [3, 4]
>  * split2: [5, 180] [5, 190]
>  * split3: [200, 600] [210, 700]



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Closed] (FLINK-27696) Add bin-pack strategy to split the whole bucket data files into several small splits for append-only table.

Reply via email to