[
https://issues.apache.org/jira/browse/FLINK-25796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Yingjie Cao closed FLINK-25796.
-------------------------------
Resolution: Fixed
> Avoid record copy for result partition of sort-shuffle if there are enough
> buffers for better performance
> ---------------------------------------------------------------------------------------------------------
>
> Key: FLINK-25796
> URL: https://issues.apache.org/jira/browse/FLINK-25796
> Project: Flink
> Issue Type: Sub-task
> Components: Runtime / Network
> Reporter: Yingjie Cao
> Assignee: Yingjie Cao
> Priority: Major
> Labels: pull-request-available
> Fix For: 1.15.0
>
>
> Currently, for result partition of sort-shuffle, there is extra record copy
> overhead Introduced by clustering records by subpartition index. For small
> records, this overhead can cause even 20% performance regression. This ticket
> aims to solve the problem.
> In fact, the hash-based implementation is a nature way to achieve the goal of
> sorting records by partition index. However, it incurs some serious
> weaknesses. For example, when there is no enough buffers or there is data
> skew, it can waste buffers and influence compression efficiency which can
> cause performance regression.
> This ticket tries to solve the issue by dynamically switching between the two
> implementations, that is, if there are enough buffers, the hash-based
> implementation will be used and if there is no enough buffers, the sort-based
> implementation will be used.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)