zhijiangW commented on a change in pull request #11567:
URL: https://github.com/apache/flink/pull/11567#discussion_r416302882
##########
File path:
flink-core/src/main/java/org/apache/flink/configuration/NettyShuffleEnvironmentOptions.java
##########
@@ -174,6 +173,20 @@
" help relieve back-pressure caused by
unbalanced data distribution among the subpartitions. This value should be" +
" increased in case of higher round trip times
between nodes and/or larger number of machines in the cluster.");
+ /**
+ * Number of max buffers can be used for each output subparition.
+ */
+ @Documentation.Section(Documentation.Sections.ALL_TASK_MANAGER_NETWORK)
+ public static final ConfigOption<Integer>
NETWORK_MAX_BUFFERS_PER_CHANNEL =
+ key("taskmanager.network.max-buffers-per-channel")
+ .defaultValue(Integer.MAX_VALUE)
Review comment:
In theory I think the proper default value here is to consider not
affecting the performance and meanwhile reducing the backlog AMAP for a
sub-partition.
The most ideally situation might be like this: the network or local channel
can consume x buffers/second, then the max backlog might be (x+1)
buffers/second to satisfy the pipeline.
In other words the consumer would never wait for the backlog to delay the
pipeline, and the backlog only needs to provide a bit overhead to satisfy the
consumer.
I guess the default 10 should be a conservative value which would not affect
performance I convinced. But I am not sure whether we can further reduce this
value without specific experiments. Anyway the default 10 might already resolve
the problem of in-flight buffers greatly.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]