[
https://issues.apache.org/jira/browse/FLINK-37018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sai Sharath Dandi updated FLINK-37018:
--------------------------------------
Description:
We observe that a single rescale event from autoscaler triggers multiple
internal restarts by the adaptive scheduler despite the job not having any
other reason/exception for internal restarts. There can be 2-3 restarts over a
very short period (1-2 mins) before the job stabilizes.
In the attached job manager logs, we can see there are
# Can change the parallelism of job. Restarting job. (17 times)
# Received resource requirements from job (7 times).
The job was internal restarted 17 times despite receiving only 7 requests from
the autoscaler for rescalings
was:
We observe that a single rescale event from autoscaler triggers multiple
internal restarts by the adaptive scheduler despite the job not having any
other reason for internal restarts. There can be 2-3 restarts over a very short
period (1-2 mins) before the job stabilizes.
In the attached job manager logs, we can see there are
# Can change the parallelism of job. Restarting job. (17 times)
#
Received resource requirements from job (7 times).
The job was internal restarted 17 times despite receiving only 7 requests from
the autoscaler for rescalings
> Adaptive scheduler triggers multiple internal restarts for a single rescale
> event
> ---------------------------------------------------------------------------------
>
> Key: FLINK-37018
> URL: https://issues.apache.org/jira/browse/FLINK-37018
> Project: Flink
> Issue Type: Bug
> Components: Runtime / Task
> Reporter: Sai Sharath Dandi
> Priority: Major
> Attachments: jobmanager.log
>
>
> We observe that a single rescale event from autoscaler triggers multiple
> internal restarts by the adaptive scheduler despite the job not having any
> other reason/exception for internal restarts. There can be 2-3 restarts over
> a very short period (1-2 mins) before the job stabilizes.
> In the attached job manager logs, we can see there are
> # Can change the parallelism of job. Restarting job. (17 times)
> # Received resource requirements from job (7 times).
>
> The job was internal restarted 17 times despite receiving only 7 requests
> from the autoscaler for rescalings
--
This message was sent by Atlassian Jira
(v8.20.10#820010)