[ 
https://issues.apache.org/jira/browse/FLINK-25796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yingjie Cao closed FLINK-25796.
-------------------------------
    Resolution: Fixed

> Avoid record copy for result partition of sort-shuffle if there are enough 
> buffers for better performance
> ---------------------------------------------------------------------------------------------------------
>
>                 Key: FLINK-25796
>                 URL: https://issues.apache.org/jira/browse/FLINK-25796
>             Project: Flink
>          Issue Type: Sub-task
>          Components: Runtime / Network
>            Reporter: Yingjie Cao
>            Assignee: Yingjie Cao
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 1.15.0
>
>
> Currently, for result partition of sort-shuffle, there is extra record copy 
> overhead Introduced by clustering records by subpartition index. For small 
> records, this overhead can cause even 20% performance regression. This ticket 
> aims to solve the problem.
> In fact, the hash-based implementation is a nature way to achieve the goal of 
> sorting records by partition index. However, it incurs some serious 
> weaknesses. For example, when there is no enough buffers or there is data 
> skew, it can waste buffers and influence compression efficiency which can 
> cause performance regression.
> This ticket tries to solve the issue by dynamically switching between the two 
> implementations, that is, if there are enough buffers, the hash-based 
> implementation will be used and if there is no enough buffers, the sort-based 
> implementation will be used.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to