[jira] [Updated] (FLINK-28512) Select HashBasedDataBuffer and SortBasedDataBuffer dynamically based on the number of network buffers can be allocated for SortMergeResultPartition

Yingjie Cao (Jira) Tue, 12 Jul 2022 04:00:25 -0700


     [ 
https://issues.apache.org/jira/browse/FLINK-28512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Yingjie Cao updated FLINK-28512:
--------------------------------
    Description: Currently, the SortMergeResultPartition select to use 
HashBasedDataBuffer and SortBasedDataBuffer based on the number of required 
buffers per result partition decided by 
'taskmanager.network.sort-shuffle.min-buffers'. If the configured value is 
large enough, HashBasedDataBuffer will be used, otherwise, SortBasedDataBuffer 
will be used. Usually, the HashBasedDataBuffer has better performance. However, 
it is not easy to tune this value, because if a user tries to increase it for 
better performance, he/she is easy to encounter the 'Insufficient number of 
network buffers' error. This patch improves this case by selecting 
HashBasedDataBuffer and SortBasedDataBuffer dynamically based on the number of 
network buffers can be allocated. More specifically, if there is enough buffers 
at runtime, HashBasedDataBuffer will be used, otherwise, SortBasedDataBuffer 
will be used. To achieve better performance, the user only need to increase 
total amount of network memory per task manager.  (was: Currently, the )

> Select HashBasedDataBuffer and SortBasedDataBuffer dynamically based on the 
> number of network buffers can be allocated for SortMergeResultPartition
> ---------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: FLINK-28512
>                 URL: https://issues.apache.org/jira/browse/FLINK-28512
>             Project: Flink
>          Issue Type: Sub-task
>            Reporter: Yingjie Cao
>            Priority: Major
>             Fix For: 1.16.0
>
>
> Currently, the SortMergeResultPartition select to use HashBasedDataBuffer and 
> SortBasedDataBuffer based on the number of required buffers per result 
> partition decided by 'taskmanager.network.sort-shuffle.min-buffers'. If the 
> configured value is large enough, HashBasedDataBuffer will be used, 
> otherwise, SortBasedDataBuffer will be used. Usually, the HashBasedDataBuffer 
> has better performance. However, it is not easy to tune this value, because 
> if a user tries to increase it for better performance, he/she is easy to 
> encounter the 'Insufficient number of network buffers' error. This patch 
> improves this case by selecting HashBasedDataBuffer and SortBasedDataBuffer 
> dynamically based on the number of network buffers can be allocated. More 
> specifically, if there is enough buffers at runtime, HashBasedDataBuffer will 
> be used, otherwise, SortBasedDataBuffer will be used. To achieve better 
> performance, the user only need to increase total amount of network memory 
> per task manager.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (FLINK-28512) Select HashBasedDataBuffer and SortBasedDataBuffer dynamically based on the number of network buffers can be allocated for SortMergeResultPartition

Reply via email to