Yingjie Cao created FLINK-31386:
-----------------------------------

             Summary: Fix the potential deadlock issue of blocking shuffle
                 Key: FLINK-31386
                 URL: https://issues.apache.org/jira/browse/FLINK-31386
             Project: Flink
          Issue Type: Bug
          Components: Runtime / Network
            Reporter: Yingjie Cao
             Fix For: 1.17.0


Currently, theĀ SortMergeResultPartition may allocate more network buffers than 
the guaranteed size of the LocalBufferPool. As a result, some result partitions 
may need to wait other result partitions to release the over-allocated network 
buffers to continue. However, the result partitions which have allocated more 
than guaranteed buffers relies on the processing of input data to trigger data 
spilling and buffer recycling. The input data further relies on batch reading 
buffers used by theĀ SortMergeResultPartitionReadScheduler which may already 
taken by those blocked result partitions which are waiting for buffers. Then 
deadlock occurs. We can easily fix this deadlock by reserving the guaranteed 
buffers on initializing.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to