[
https://issues.apache.org/jira/browse/FLINK-16403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17051367#comment-17051367
]
Zhijiang commented on FLINK-16403:
----------------------------------
Duplicated with FLINK-16404
> Solve the potential deadlock problem when reducing exclusive buffers to zero
> ----------------------------------------------------------------------------
>
> Key: FLINK-16403
> URL: https://issues.apache.org/jira/browse/FLINK-16403
> Project: Flink
> Issue Type: Sub-task
> Components: Runtime / Network
> Reporter: Zhijiang
> Priority: Critical
>
> One motivation of this issue is for reducing the in-flight data in the case
> of back pressure to speed up checkpoint. The current default exclusive
> buffers per channel is 2. If we reduce it to 0 and increase somewhat floating
> buffers for compensation, it might cause deadlock problem because all the
> floating buffers might be requested away by some blocked input channels and
> never recycled until barrier alignment.
> In order to solve above deadlock concern, we can make some logic changes on
> both sender and receiver sides.
> * Sender side: it should revoke previous received credit after sending
> checkpoint barrier, that means it would not send any following buffers until
> receiving new credits.
> * Receiver side: after processing the barrier from one channel and setting
> it blocked, it should release the available floating buffers for this blocked
> channel, and restore requesting floating buffers until barrier alignment.
> That means the receiver would only announce new credits to sender side after
> barrier alignment.
> Another possible benefit to do so is that the floating buffers might be more
> properly made use of before barrier alignment. We can further verify the
> performance concern via existing micro-benchmark.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)