[
https://issues.apache.org/jira/browse/FLINK-27002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jingsong Lee updated FLINK-27002:
---------------------------------
Description:
By default, batch sink should sort the input by partition and sequence_field to
avoid generating a large number of small files. Too many small files cause poor
performance, especially object storage.
We can not implement `SupportsPartitioning.requiresPartitionGrouping`. we need
sequence.field to sort, otherwise we can't confirm what the last record is.
was:We can implement `SupportsPartitioning.requiresPartitionGrouping`. Write
table_store after the planner is ordered by partition to avoid OOM caused by
writing too many partitions at the same time.
> Optimize batch multiple partitions inserting
> --------------------------------------------
>
> Key: FLINK-27002
> URL: https://issues.apache.org/jira/browse/FLINK-27002
> Project: Flink
> Issue Type: Improvement
> Components: Table Store
> Reporter: Jingsong Lee
> Priority: Minor
> Fix For: table-store-0.3.0
>
>
> By default, batch sink should sort the input by partition and sequence_field
> to avoid generating a large number of small files. Too many small files cause
> poor performance, especially object storage.
> We can not implement `SupportsPartitioning.requiresPartitionGrouping`. we
> need sequence.field to sort, otherwise we can't confirm what the last record
> is.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)