[
https://issues.apache.org/jira/browse/FLINK-34318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Gyula Fora closed FLINK-34318.
------------------------------
Resolution: Fixed
Closing this as duplicate thanks [~fanrui]
> AdaptiveScheduler resource stabilisation should happen before the job is
> cancelled
> ----------------------------------------------------------------------------------
>
> Key: FLINK-34318
> URL: https://issues.apache.org/jira/browse/FLINK-34318
> Project: Flink
> Issue Type: Improvement
> Components: Runtime / Coordination
> Reporter: Gyula Fora
> Priority: Major
>
> When a new resource requirement is submitted to the AdaptiveScheduler which
> increases the resource upper bound (max taskmanagers), when the first
> TaskManager comes up the job is immediately cancelled.
> Once the job is cancelled the scheduler waits for the entire stabilisation
> period to pass if it cannot acquire all resources before starting with the
> lower-than-requested parallelism.
> The problem here is that waiting for resource stabilisation happens after the
> job is cancelled, introducing unnecessary downtime for the job if the
> stabilisation period is large.
> We should change logic here to wait for the stabilisation period first to
> acquire all possible resources before cancelling the job.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)