[
https://issues.apache.org/jira/browse/FLINK-32027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Weijie Guo closed FLINK-32027.
------------------------------
Fix Version/s: 1.18.0
Resolution: Fixed
master(1.18) via 63443aec09ece8596321328273c1e431e5029c4d.
release-1.17 via 8e5fb18ae5a80c4d0620979a944b017b203cdeac.
release-1.16 via c5a883d3976fc8367eba446790088ff46e59ab79.
> Batch jobs could hang at shuffle phase when max parallelism is really large
> ---------------------------------------------------------------------------
>
> Key: FLINK-32027
> URL: https://issues.apache.org/jira/browse/FLINK-32027
> Project: Flink
> Issue Type: Bug
> Components: Runtime / Network
> Affects Versions: 1.16.0, 1.17.0, 1.16.1
> Reporter: Yun Tang
> Assignee: Weijie Guo
> Priority: Blocker
> Labels: pull-request-available
> Fix For: 1.16.2, 1.18.0, 1.17.1
>
> Attachments: image-2023-05-08-11-12-58-361.png
>
>
> In batch stream mode with adaptive batch schedule mode, If we set the max
> parallelism large as 32768 (pipeline.max-parallelism), the job could hang at
> the shuffle phase:
> It would hang for a long time and show "No bytes sent":
> !image-2023-05-08-11-12-58-361.png!
> After some time to debug, we can see the downstream operator did not receive
> the end-of-partition event.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)