[
https://issues.apache.org/jira/browse/FLINK-16641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17296894#comment-17296894
]
Yingjie Cao commented on FLINK-16641:
-------------------------------------
[~pnowojski] [~zjwang] I have updated the PR and I will supplement some more
tests soon.
> Announce sender's backlog to solve the deadlock issue without exclusive
> buffers
> -------------------------------------------------------------------------------
>
> Key: FLINK-16641
> URL: https://issues.apache.org/jira/browse/FLINK-16641
> Project: Flink
> Issue Type: Sub-task
> Components: Runtime / Network
> Reporter: Zhijiang
> Assignee: Yingjie Cao
> Priority: Major
> Labels: pull-request-available
> Fix For: 1.13.0
>
>
> This is the second ingredient besides FLINK-16404 to solve the deadlock
> problem without exclusive buffers.
> The scenario is as follows:
> * The data in subpartition with positive backlog can be sent without doubt
> because the exclusive credits would be feedback finally.
> * Without exclusive buffers, the receiver would not request floating buffers
> for 0 backlog. But when the new backlog is added into such subpartition, it
> has no way to notify the receiver side without positive credits ATM.
> * So it would result in waiting for each other between receiver and sender
> sides to cause deadlock. The sender waits for credit to notify backlog and
> the receiver waits for backlog to request floating credits.
> To solve the above problem, the sender needs a separate message to announce
> backlog sometimes besides existing `BufferResponse`. Then the receiver can
> get this info to request floating buffers to feedback.
> The side effect brought is to increase network transport delay and throughput
> regression. We can measure how much it effects in existing micro-benchmark.
> It might probably bear this effect to get a benefit of fast checkpoint
> without exclusive buffers. We can give the proper explanations in respective
> configuration options to let users make the final decision in practice.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)