Yun Tang created FLINK-32027:
--------------------------------
Summary: Batch jobs could hang at shuffle phase when max
parallelism is really large
Key: FLINK-32027
URL: https://issues.apache.org/jira/browse/FLINK-32027
Project: Flink
Issue Type: Bug
Components: Runtime / Network
Affects Versions: 1.17.0
Reporter: Yun Tang
Fix For: 1.17.1
Attachments: image-2023-05-08-11-12-58-361.png
In batch stream mode with adaptive batch schedule mode, If we set the max
parallelism large as 32768 (pipeline.max-parallelism), the job could hang at
the shuffle phase:
It would hang for a long time and show "No bytes sent":
!image-2023-05-08-11-12-58-361.png!
After some time to debug, we can see the downstream operator did not receive
the end-of-partition event.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)