[
https://issues.apache.org/jira/browse/FLINK-24035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17407129#comment-17407129
]
Piotr Nowojski commented on FLINK-24035:
----------------------------------------
Solution here is to always request at least a single buffer.
This is not an issue on the output side, because there are no
{{BufferListeners}} in that case. Also if buffer is requested on the output
side, it will be used and eventually flushed.
On the input on the other hand, with exclusive buffers > 0, we are already
requesting exclusive buffers in a blocking way with a timeout (FLINK-12852), so
we know that task will be able to make progress regardless if we notify about
more buffers or not. With exclusive buffers = 0, this solution requests a
single floating buffer, so we will also be able to make a progress. Once data
starts flowing/this single buffer will be recycled, listeners would be notified
and more buffers would be requested.
> Fix the deadlock issue caused by buffer listeners may not be notified
> ---------------------------------------------------------------------
>
> Key: FLINK-24035
> URL: https://issues.apache.org/jira/browse/FLINK-24035
> Project: Flink
> Issue Type: Bug
> Components: Runtime / Network
> Affects Versions: 1.14.0
> Reporter: Yingjie Cao
> Assignee: Yingjie Cao
> Priority: Critical
> Labels: pull-request-available
> Fix For: 1.14.0
>
>
> The buffer listeners are not notified when the the local buffer pool receives
> available notification from the global pool. This may cause potential
> deadlock issue:
> # A LocalBufferPool is created, but there is no available buffers in the
> global NetworkBufferPool.
> # The LocalBufferPool registers an available buffer listener to the global
> NetworkBufferPool.
> # The BufferManager requests buffers from the LocalBufferPool but no buffer
> is available. As a result, it registers an available buffer listener to the
> LocalBufferPool.
> # A buffer is recycled to the global pool and the local buffer pool is
> notified about the available buffer.
> # The local buffer pool requests the available buffer from the global pool
> but the registered available buffer listener of BufferManager is not notified
> and it can never get a chance to be notified so deadlock occurs.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)