[ https://issues.apache.org/jira/browse/FLINK-14872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16979866#comment-16979866 ]
Yingjie Cao edited comment on FLINK-14872 at 11/22/19 5:52 AM: --------------------------------------------------------------- There are some choices we may consider: # Like Blink, each buffer pool has a finite min and max size which are calculate from number of channels and config options like buffer-per-channel, JM and RM will calculate according the max value a buffer pool can use, that is, the max value is guaranteed and no buffer pool will use more than the max value. The weak point of this option would be guaranteeing the max value and no floating may lead to waste of network memory. # As mentioned in FLINK-13203, no buffer floating, the downside is also waste of memory, compared with 1, the good side would be we do not need to calculate the number of buffers at JM and RM. # As mentioned in FLINK-13203, make buffers always revokable by spilling. The problem we must consider would be when, how and which thread do the spill, which is not trivial. A choice may be allocating exclusive and core memory at setup and notify the buffer pool owner to release buffer if no enough buffer available. NetworkBufferPool may decide many buffers should be freed by each local pool. The problem of this option is some times we may not need to spill and only need to wait some more time. And another problem is that the Implementation is a little complicated. # The simplest way may be just like what we did for exclusive buffer, that is, allocating core (required) buffers at setup and giving a timeout. The advantage of this option is that it dose not change the behavior of the system and it's simple enough, though it dose not solves the deadlock problem directly. # Split the network memory into core and floating parts. The core buffers are not floatable, and we only reserve buffers from the core pool. The floating buffers are shared among all local pools. However this decrease the core buffer we can use and can lead to buffer insufficient problem, the users may need to reconfig the network buffer. Option 2 can be a special case of this one if we set the floating buffer to 0 (no floating). I would prefer this solution if the side impact on user is acceptable. For a short term fix, I would prefer 4. For a long term solution, I think we can consider 3 and 5. What do you think? [~pnowojski] was (Author: kevin.cyj): There are some choices we may consider: # Like Blink, each buffer pool has a finite min and max size which are calculate from number of channels and config options like buffer-per-channel, JM and RM will calculate according the max value a buffer pool can use, that is, the max value is guaranteed and no buffer pool will use more than the max value. The weak point of this option would be guaranteeing the max value and no floating may lead to waste of network memory. # As mentioned in FLINK-13203, no buffer floating, the downside is also waste of memory, compared with 1, the good side would be we do not need to calculate the number of buffers at JM and RM. # As mentioned in FLINK-13203, make buffers always revokable by spilling. The problem we must consider would be when, how and which thread do the spill, which is not trivial. A choice may be allocating exclusive and core memory at setup and notify the buffer pool owner to release buffer if no enough buffer available. NetworkBufferPool may decide many buffers should be freed by each local pool. The problem of this option is some times we may not need to spill and only need to wait some more time. And another problem is that the Implementation is a little complicated. # The simplest way may be just like what we did for exclusive buffer, that is, allocating core (required) buffers at setup and giving a timeout. The advantage of this option is that it dose not change the behavior of the system and it's simple enough, though it dose not solves the deadlock problem directly. # Split the network memory into core and floating parts. The core buffers are not floatable, and we only reserve buffers from the core pool. The floating buffers are shared among all local pools. However this decrease the core buffer we can use and can lead to buffer insufficient problem, the users may need to reconfig the network buffer. Option a can be special case of this one if we set the floating buffer to 0 (no floating). I would prefer this solution if the side impact on user is acceptable. For a short term fix, I would prefer 4. For a long term solution, I think we can consider 3 and 5. What do you think? [~pnowojski] > Potential deadlock for task reading from blocking ResultPartition. > ------------------------------------------------------------------ > > Key: FLINK-14872 > URL: https://issues.apache.org/jira/browse/FLINK-14872 > Project: Flink > Issue Type: Bug > Reporter: Yingjie Cao > Priority: Blocker > Fix For: 1.10.0 > > > Currently, the buffer pool size of InputGate reading from blocking > ResultPartition is unbounded which have a potential of using too many buffers > and may lead to ResultPartition of the same task can not acquire enough core > buffers and finally lead to deadlock. > Considers the following case: > Core buffers are reserved for InputGate and ResultPartition -> InputGate > consumes lots of Buffer (not including the buffer reserved for > ResultPartition) -> Other tasks acquire exclusive buffer for InputGate and > trigger redistribute of Buffers (Buffers taken by previous InputGate can not > be released) -> The first task of which InputGate uses lots of buffers begin > to emit records but can not acquire enough core Buffers (Some operators may > not emit records out immediately or there is just nothing to emit) -> > Deadlock. > > I think we can fix this problem by limit the number of Buffers can be > allocated by a InputGate which reads from blocking ResultPartition. -- This message was sent by Atlassian Jira (v8.3.4#803005)