[ 
https://issues.apache.org/jira/browse/FLINK-14872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16979866#comment-16979866
 ] 

Yingjie Cao edited comment on FLINK-14872 at 11/22/19 5:52 AM:
---------------------------------------------------------------

There are some choices we may consider:
 # Like Blink, each buffer pool has a finite min and max size which are 
calculate from number of channels and config options like buffer-per-channel, 
JM and RM will calculate according the max value a buffer pool can use, that 
is, the max value is guaranteed and no buffer pool will use more than the max 
value. The weak point of this option would be guaranteeing the max value and no 
floating may lead to waste of network memory.
 # As mentioned in FLINK-13203, no buffer floating, the downside is also waste  
of memory, compared with 1, the good side would be we do not need to calculate 
the number of buffers at JM and RM.
 # As mentioned in FLINK-13203, make buffers always revokable by spilling. The 
problem we must consider would be when, how and which thread do the spill, 
which is not trivial. A choice may be allocating exclusive and core memory at 
setup and notify the buffer pool owner to release buffer if no enough buffer 
available. NetworkBufferPool may decide many buffers should be freed by each 
local pool. The problem of this option is some times we may not need to spill 
and only need to wait some more time. And another problem is that the 
Implementation is a little complicated.
 # The simplest way may be just like what we did for exclusive buffer, that is, 
allocating core (required) buffers at setup and giving a timeout. The advantage 
of this option is that it dose not change the behavior of the system and it's 
simple enough, though it dose not solves the deadlock problem directly.
 # Split the network memory into core and floating parts. The core buffers are 
not floatable, and we only reserve buffers from the core pool. The floating 
buffers are shared among all local pools. However this decrease the core buffer 
we can use and can lead to buffer insufficient problem, the users may need to 
reconfig the network buffer. Option 2 can be a special case of this one if we 
set the floating buffer to 0 (no floating). I would prefer this solution if the 
side impact on user is acceptable.

For a short term fix, I would prefer 4. For a long term solution, I think we 
can consider 3 and 5. What do you think? [~pnowojski]


was (Author: kevin.cyj):
There are some choices we may consider:
 # Like Blink, each buffer pool has a finite min and max size which are 
calculate from number of channels and config options like buffer-per-channel, 
JM and RM will calculate according the max value a buffer pool can use, that 
is, the max value is guaranteed and no buffer pool will use more than the max 
value. The weak point of this option would be guaranteeing the max value and no 
floating may lead to waste of network memory.
 # As mentioned in FLINK-13203, no buffer floating, the downside is also waste  
of memory, compared with 1, the good side would be we do not need to calculate 
the number of buffers at JM and RM.
 # As mentioned in FLINK-13203, make buffers always revokable by spilling. The 
problem we must consider would be when, how and which thread do the spill, 
which is not trivial. A choice may be allocating exclusive and core memory at 
setup and notify the buffer pool owner to release buffer if no enough buffer 
available. NetworkBufferPool may decide many buffers should be freed by each 
local pool. The problem of this option is some times we may not need to spill 
and only need to wait some more time. And another problem is that the 
Implementation is a little complicated.
 # The simplest way may be just like what we did for exclusive buffer, that is, 
allocating core (required) buffers at setup and giving a timeout. The advantage 
of this option is that it dose not change the behavior of the system and it's 
simple enough, though it dose not solves the deadlock problem directly.
 # Split the network memory into core and floating parts. The core buffers are 
not floatable, and we only reserve buffers from the core pool. The floating 
buffers are shared among all local pools. However this decrease the core buffer 
we can use and can lead to buffer insufficient problem, the users may need to 
reconfig the network buffer. Option a can be special case of this one if we set 
the floating buffer to 0 (no floating). I would prefer this solution if the 
side impact on user is acceptable.

For a short term fix, I would prefer 4. For a long term solution, I think we 
can consider 3 and 5. What do you think? [~pnowojski]

> Potential deadlock for task reading from blocking ResultPartition.
> ------------------------------------------------------------------
>
>                 Key: FLINK-14872
>                 URL: https://issues.apache.org/jira/browse/FLINK-14872
>             Project: Flink
>          Issue Type: Bug
>            Reporter: Yingjie Cao
>            Priority: Blocker
>             Fix For: 1.10.0
>
>
> Currently, the buffer pool size of InputGate reading from blocking 
> ResultPartition is unbounded which have a potential of using too many buffers 
> and may lead to ResultPartition of the same task can not acquire enough core 
> buffers and finally lead to deadlock.
> Considers the following case:
> Core buffers are reserved for InputGate and ResultPartition -> InputGate 
> consumes lots of Buffer (not including the buffer reserved for 
> ResultPartition) -> Other tasks acquire exclusive buffer for InputGate and 
> trigger redistribute of Buffers (Buffers taken by previous InputGate can not 
> be released) -> The first task of which InputGate uses lots of buffers begin 
> to emit records but can not acquire enough core Buffers (Some operators may 
> not emit records out immediately or there is just nothing to emit) -> 
> Deadlock.
>  
> I think we can fix this problem by limit the number of Buffers can be 
> allocated by a InputGate which reads from blocking ResultPartition.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to