Yingjie Cao created FLINK-14872:
-----------------------------------
Summary: Potential deadlock for task reading from blocking
ResultPartition.
Key: FLINK-14872
URL: https://issues.apache.org/jira/browse/FLINK-14872
Project: Flink
Issue Type: Bug
Reporter: Yingjie Cao
Currently, the buffer pool size of InputGate reading from blocking
ResultPartition is unbounded which have a potential of using too many buffers
and may lead to ResultPartition of the same task can not acquire enough core
buffers and finally lead to deadlock.
Considered the following case:
Core buffers are reserved for InputGate and ResultPartition -> InputGate
consumes lots of Buffer (not including the buffer reserved for ResultPartition)
-> Other tasks acquire exclusive buffer for InputGate and trigger redistribute
of Buffers (Buffers taken by previous InputGate can not be released) -> The
first task of which InputGate uses lots of buffers begin to emit records but
can not acquire enough core Buffers (Some operators may not emit records out
immediately or there is just nothing to emit) -> Deadlock.
I think we can fix this problem by limit the number of Buffers can be allocated
by a InputGate which reads from blocking ResultPartition.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)