Sai Sharath Dandi created FLINK-37018:
-----------------------------------------
Summary: Adaptive scheduler triggers multiple internal restarts
for a single rescale event
Key: FLINK-37018
URL: https://issues.apache.org/jira/browse/FLINK-37018
Project: Flink
Issue Type: Bug
Components: Runtime / Task
Reporter: Sai Sharath Dandi
Attachments: jobmanager.log
We observe that a single rescale event from autoscaler triggers multiple
internal restarts by the adaptive scheduler despite the job not having any
other reason for internal restarts. There can be 2-3 restarts over a very short
period (1-2 mins) before the job stabilizes.
In the attached job manager logs, we can see there are
# Can change the parallelism of job. Restarting job. (17 times)
#
Received resource requirements from job (7 times).
The job was internal restarted 17 times despite receiving only 7 requests from
the autoscaler for rescalings
--
This message was sent by Atlassian Jira
(v8.20.10#820010)