[
https://issues.apache.org/jira/browse/FLINK-10661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16723037#comment-16723037
]
Nico Kruber commented on FLINK-10661:
-------------------------------------
[~zjwang] I'm not quite sure I get the problem, so let's first try to get a
common understanding of the problem. Are the following the scenarios you were
describing?
1) scaling out: if, for example, we are scaling from 10 to 20, under full load
I would expect the out queues of the 10 to be full and the 20 to be at 50%
only, simply because there are 10 network outputs vs. 20 inputs.
2) scaling in: inversely, if you scale from 20 to 10, input queues are full but
output queues at 50%.
For these cases, you want to be able to reclaim the unused buffers for some
other part of the pipeline? Simply splitting {{buffers-per-channel}} into one
parameter for the sender and a separate one for the receiver won't be enough
then because you may have both operations, i.e. scale-out and scale-in, in your
job-graph. What you may want is to be able to fine-tune this per operator; that
would help and give you the desired control.
> Initial credit should be configured in a separate parameter
> -----------------------------------------------------------
>
> Key: FLINK-10661
> URL: https://issues.apache.org/jira/browse/FLINK-10661
> Project: Flink
> Issue Type: Sub-task
> Components: Network
> Affects Versions: 1.5.4, 1.6.1
> Reporter: zhijiang
> Assignee: zhijiang
> Priority: Minor
>
> In credit-based network flow control, the required credits on receiver side
> are calculated by backlog plus initial credit which is equal to the value in
> parameter {{taskmanager.network.memory.buffers-per-channel}}. We plus the
> initial credit as backlog overhead in order to decrease the possibility of
> waiting credits on sender side. The best result is concurrent work between
> sender and receiver, not block each other.
>
> We found a bad case in some rebalance or rescale scenarios, the outqueue
> usage reaches 100% on sender side, but the inqueue usage is about 50% or
> less. That means the credit announcement is not enough for sender side
> although there are still many free credit resources on receiver side. So it
> is not reasonable resulting in wasting resources.
>
> It would be better if we can adjust the credit overhead to debug the
> performance online. And it needs another separate parameter to define initial
> credit not messed with {{taskmanager.network.memory.buffers-per-channel}}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)