Jiang Xin created FLINK-33668:
---------------------------------

             Summary: Decoupling Shuffle network memory and job topology
                 Key: FLINK-33668
                 URL: https://issues.apache.org/jira/browse/FLINK-33668
             Project: Flink
          Issue Type: Improvement
          Components: Runtime / Network
            Reporter: Jiang Xin
             Fix For: 1.19.0


With [FLINK-30469|https://issues.apache.org/jira/browse/FLINK-30469]  and 
[FLINK-31643|https://issues.apache.org/jira/browse/FLINK-31643], we have 
decoupled the shuffle network memory and the parallelism of tasks by limiting 
the number of buffers for each InputGate and ResultPartition. However, when too 
many shuffle tasks are running simultaneously on the same TaskManager, 
"Insufficient number of network buffers" errors would still occur. This usually 
happens when Slot Sharing Group is enabled or a TaskManager contains multiple 
slots.

So we need to make sure that the TaskManager does not encounter "Insufficient 
number of network buffers" even if there are dozens of InputGates and 
ResultPartitions running on the same TaskManager simultaneously.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to