[ 
https://issues.apache.org/jira/browse/FLINK-10661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16723037#comment-16723037
 ] 

Nico Kruber commented on FLINK-10661:
-------------------------------------

[~zjwang] I'm not quite sure I get the problem, so let's first try to get a 
common understanding of the problem. Are the following the scenarios you were 
describing?

1) scaling out: if, for example, we are scaling from 10 to 20, under full load 
I would expect the out queues of the 10 to be full and the 20 to be at 50% 
only, simply because there are 10 network outputs vs. 20 inputs.

2) scaling in: inversely, if you scale from 20 to 10, input queues are full but 
output queues at 50%.

For these cases, you want to be able to reclaim the unused buffers for some 
other part of the pipeline? Simply splitting {{buffers-per-channel}} into one 
parameter for the sender and a separate one for the receiver won't be enough 
then because you may have both operations, i.e. scale-out and scale-in, in your 
job-graph. What you may want is to be able to fine-tune this per operator; that 
would help and give you the desired control.

> Initial credit should be configured in a separate parameter
> -----------------------------------------------------------
>
>                 Key: FLINK-10661
>                 URL: https://issues.apache.org/jira/browse/FLINK-10661
>             Project: Flink
>          Issue Type: Sub-task
>          Components: Network
>    Affects Versions: 1.5.4, 1.6.1
>            Reporter: zhijiang
>            Assignee: zhijiang
>            Priority: Minor
>
> In credit-based network flow control, the required credits on receiver side 
> are calculated by backlog plus initial credit which is equal to the value in 
> parameter {{taskmanager.network.memory.buffers-per-channel}}. We plus the 
> initial credit as backlog overhead in order to decrease the possibility of 
> waiting credits on sender side. The best result is concurrent work between 
> sender and receiver, not block each other.
>  
> We found a bad case in some rebalance or rescale scenarios, the outqueue 
> usage reaches 100% on sender side, but the inqueue usage is about 50% or 
> less.  That means the credit announcement is not enough for sender side 
> although there are still many free credit resources on receiver side. So it 
> is not reasonable resulting in wasting resources.
>  
> It would be better if we can adjust the credit overhead to debug the 
> performance online. And it needs another separate parameter to define initial 
> credit not messed with {{taskmanager.network.memory.buffers-per-channel}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to