[
https://issues.apache.org/jira/browse/FLINK-14872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16979866#comment-16979866
]
Yingjie Cao commented on FLINK-14872:
-------------------------------------
There are some choices we may consider:
# Like Blink, each buffer pool has a finite min and max size which are
calculate from number of channels and config options like buffer-per-channel,
JM and RM will calculate according the max value a buffer pool can use, that
is, the max value is guaranteed and no buffer pool will use more than the max
value. The weak point of this option would be guaranteeing the max value and no
floating may lead to waste of network memory.
# As mentioned in FLINK-13203, no buffer floating, the downside is also waste
of memory, compared with 1, the good side would be we do not need to calculate
the number of buffers at JM and RM.
# As mentioned in FLINK-13203, make buffers always revokable by spilling. The
problem we must consider would be when, how and which thread do the spill,
which is not trivial. A choice may be allocating exclusive and core memory at
setup and notify the buffer pool owner to release buffer if no enough buffer
available. NetworkBufferPool may decide many buffers should be freed by each
local pool. The problem of this option is some times we may not need to spill
and only need to wait some more time. And another problem is that the
Implementation is a little complicated.
# The simplest way may be just like what we did for exclusive buffer, that is,
allocating core (required) buffers at setup and giving a timeout. The advantage
of this option is that it dose not change the behavior of the system and it's
simple enough, though it dose not solves the deadlock problem directly.
# Split the network memory into core and floating parts. The core buffers are
not floatable, and we only reserve buffers from the core pool. The floating
buffers are shared among all local pools. However this decrease the core buffer
we can use and can lead to buffer insufficient problem, the users may need to
reconfig the network buffer. Option a can be special case of this one if we set
the floating buffer to 0 (no floating). I would prefer this solution if the
side impact on user is acceptable.
For a short term fix, I would prefer 4. For a long term solution, I think we
can consider 3 and 5. What do you think? [~pnowojski]
> Potential deadlock for task reading from blocking ResultPartition.
> ------------------------------------------------------------------
>
> Key: FLINK-14872
> URL: https://issues.apache.org/jira/browse/FLINK-14872
> Project: Flink
> Issue Type: Bug
> Reporter: Yingjie Cao
> Priority: Blocker
> Fix For: 1.10.0
>
>
> Currently, the buffer pool size of InputGate reading from blocking
> ResultPartition is unbounded which have a potential of using too many buffers
> and may lead to ResultPartition of the same task can not acquire enough core
> buffers and finally lead to deadlock.
> Considers the following case:
> Core buffers are reserved for InputGate and ResultPartition -> InputGate
> consumes lots of Buffer (not including the buffer reserved for
> ResultPartition) -> Other tasks acquire exclusive buffer for InputGate and
> trigger redistribute of Buffers (Buffers taken by previous InputGate can not
> be released) -> The first task of which InputGate uses lots of buffers begin
> to emit records but can not acquire enough core Buffers (Some operators may
> not emit records out immediately or there is just nothing to emit) ->
> Deadlock.
>
> I think we can fix this problem by limit the number of Buffers can be
> allocated by a InputGate which reads from blocking ResultPartition.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)