[
https://issues.apache.org/jira/browse/FLINK-27696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jingsong Lee closed FLINK-27696.
--------------------------------
Resolution: Fixed
master: c80f600c149fc90f8522f43af67f0ab2a713b57f
> Add bin-pack strategy to split the whole bucket data files into several small
> splits for append-only table.
> -----------------------------------------------------------------------------------------------------------
>
> Key: FLINK-27696
> URL: https://issues.apache.org/jira/browse/FLINK-27696
> Project: Flink
> Issue Type: Sub-task
> Reporter: Zheng Hu
> Assignee: Jingsong Lee
> Priority: Major
> Labels: pull-request-available
> Fix For: table-store-0.2.0
>
>
> We don't have to assign each task with a whole bucket data files. Instead, we
> can use some algorithm ( such as bin-packing) to split the whole bucket data
> files into multiple fragments to improve the job parallelism.
> For merge tree table:
> Suppose now there are files: [1, 2] [3, 4] [5, 180] [5, 190] [200, 600] [210,
> 700]
> Files without intersection are not related, we do not need to put all files
> into one split, we can slice into multiple splits, multiple parallelism
> execution is faster. Nor can we slice too fine, we should make each split as
> large as possible with 128 MB, so use BinPack to slice, the final result will
> be:
> * split1: [1, 2] [3, 4]
> * split2: [5, 180] [5, 190]
> * split3: [200, 600] [210, 700]
--
This message was sent by Atlassian Jira
(v8.20.7#820007)