[
https://issues.apache.org/jira/browse/FLINK-31386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
ASF GitHub Bot updated FLINK-31386:
-----------------------------------
Labels: pull-request-available (was: )
> Fix the potential deadlock issue of blocking shuffle
> ----------------------------------------------------
>
> Key: FLINK-31386
> URL: https://issues.apache.org/jira/browse/FLINK-31386
> Project: Flink
> Issue Type: Bug
> Components: Runtime / Network
> Reporter: Yingjie Cao
> Assignee: Yingjie Cao
> Priority: Blocker
> Labels: pull-request-available
> Fix For: 1.17.0
>
>
> Currently, theĀ SortMergeResultPartition may allocate more network buffers
> than the guaranteed size of the LocalBufferPool. As a result, some result
> partitions may need to wait other result partitions to release the
> over-allocated network buffers to continue. However, the result partitions
> which have allocated more than guaranteed buffers relies on the processing of
> input data to trigger data spilling and buffer recycling. The input data
> further relies on batch reading buffers used by theĀ
> SortMergeResultPartitionReadScheduler which may already taken by those
> blocked result partitions which are waiting for buffers. Then deadlock
> occurs. We can easily fix this deadlock by reserving the guaranteed buffers
> on initializing.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)