[ 
https://issues.apache.org/jira/browse/FLINK-11082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16739385#comment-16739385
 ] 

Piotr Nowojski commented on FLINK-11082:
----------------------------------------

Another issue.

Could this bug might explain why one user was recently reporting higher CPU 
usage and 300% increase in number of packets being sent between the nodes after 
upgrading from Flink 1.4? Previously we were aware that credit base flow 
control increases the network traffic/number of messages sent between nodes by 
100%. But if we announce the fresh partial buffers immediately to the receiver, 
could it be that the small chunk of that data is being sent prematurely, before 
{{flushRequested}} or next {{BufferConsumer}} is enqueued? Sending chunk of 
data prematurely and assigning new credit would explain the remaining 
unaccounted "200%" number of messages being sent.

Btw, [~zjwang] if channel is idle, two exclusive buffers will be assigned to 
the sender and he will have some buffers for immediate use whenever the channel 
becomes active?

> Increase backlog only if it is available for consumption
> --------------------------------------------------------
>
>                 Key: FLINK-11082
>                 URL: https://issues.apache.org/jira/browse/FLINK-11082
>             Project: Flink
>          Issue Type: Sub-task
>          Components: Network
>    Affects Versions: 1.5.6, 1.6.3, 1.7.1, 1.8.0
>            Reporter: zhijiang
>            Assignee: zhijiang
>            Priority: Major
>
> The backlog should indicate how many buffers are available in subpartition 
> for downstream's  consumption. The availability is considered from two 
> factors. One is {{BufferConsumer}} finished, and the other is flush triggered.
> In current implementation, when the {{BufferConsumer}} is added into the 
> subpartition, then the backlog is increased as a result, but this 
> {{BufferConsumer}} is not yet available for network transport.
> Furthermore, the backlog would affect requesting floating buffers on 
> downstream side. That means some floating buffers are fetched in advance but 
> not be used for long time, so the floating buffers are not made use of 
> efficiently.
> We found this scenario extremely for rebalance selector on upstream side, so 
> we want to change when to increase backlog by finishing {{BufferConsumer}} or 
> flush triggered.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to