[jira] [Commented] (FLINK-31457) Support waiting for required resources in DefaultScheduler during job restart

Junrui Li (Jira) Tue, 14 Mar 2023 21:39:05 -0700


    [ 
https://issues.apache.org/jira/browse/FLINK-31457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17700503#comment-17700503
 ]


Junrui Li commented on FLINK-31457:
-----------------------------------

[~a.pilipenko] I'm not sure what is the scenario where 
`NoResourceAvailableException` will be reported after job restart? Can you 
describe it in detail?

IIUC, if it is a session cluster, the slot may be occupied by other jobs after 
slot idle timeout. Maybe you can increase the slot.idle.timeout.

In addition, the adaptive scheduler has a mechanism to wait for resources 
because it can dynamically adjust the parallelism, and run jobs with a small 
parallelism when resources are insufficient, while the default scheduler does 
not have such a capability, so when resources are insufficient, it will report 
`NoResourceAvailableException`. If you want to run jobs even when resources are 
insufficient, you can use the adaptive scheduler in stream job.

> Support waiting for required resources in DefaultScheduler during job restart
> -----------------------------------------------------------------------------
>
>                 Key: FLINK-31457
>                 URL: https://issues.apache.org/jira/browse/FLINK-31457
>             Project: Flink
>          Issue Type: Improvement
>          Components: Runtime / Coordination
>    Affects Versions: 1.15.3
>            Reporter: Aleksandr Pilipenko
>            Priority: Major
>
> Currently Flink support [waiting for required resources to become 
> available|https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/config/#jobmanager-adaptive-scheduler-resource-stabilization-timeout]
>  during job restart only while using adaptive scheduler.
> On the other hand, if cluster is using default scheduler and there is not 
> enough slots available - restart attempts will fail with 
> `NoResourceAvailableException` until resource requirements are satisfied.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (FLINK-31457) Support waiting for required resources in DefaultScheduler during job restart

Reply via email to