Jiang Xin created FLINK-33668:
---------------------------------
Summary: Decoupling Shuffle network memory and job topology
Key: FLINK-33668
URL: https://issues.apache.org/jira/browse/FLINK-33668
Project: Flink
Issue Type: Improvement
Components: Runtime / Network
Reporter: Jiang Xin
Fix For: 1.19.0
With [FLINK-30469|https://issues.apache.org/jira/browse/FLINK-30469] and
[FLINK-31643|https://issues.apache.org/jira/browse/FLINK-31643], we have
decoupled the shuffle network memory and the parallelism of tasks by limiting
the number of buffers for each InputGate and ResultPartition. However, when too
many shuffle tasks are running simultaneously on the same TaskManager,
"Insufficient number of network buffers" errors would still occur. This usually
happens when Slot Sharing Group is enabled or a TaskManager contains multiple
slots.
So we need to make sure that the TaskManager does not encounter "Insufficient
number of network buffers" even if there are dozens of InputGates and
ResultPartitions running on the same TaskManager simultaneously.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)