[
https://issues.apache.org/jira/browse/FLINK-37024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Zhanghao Chen updated FLINK-37024:
----------------------------------
Description: We observed that task can be stuck in deploying state forever
when the task loading/instantiating logic has some issues. Cancelling the job /
failover caused by failures of other tasks will also get stuck as the cancel
watch dog won't work for tasks in CREATED/DEPLOYING state at present. We should
make cancel watch dog cover tasks in DEPLOYING as well (no need for tasks in
CREATED state has there is no real logic between CREATED->DEPLOYING). (was:
We observed that task can be stuck in deploying state forever when the task
initializing logic has some issues. Cancelling the job / failover caused by
failures of other tasks will also get stuck as the cancel watch dog won't work
for tasks in CREATED/DEPLOYING state at present. We should make cancel watch
dog cover tasks in DEPLOYING as well (no need for tasks in CREATED state has
there is no real logic between CREATED->DEPLOYING).)
> Task can be stuck in deploying state forever when canceling job/failover
> ------------------------------------------------------------------------
>
> Key: FLINK-37024
> URL: https://issues.apache.org/jira/browse/FLINK-37024
> Project: Flink
> Issue Type: Bug
> Components: Runtime / Task
> Affects Versions: 1.20.0
> Reporter: Zhanghao Chen
> Priority: Major
>
> We observed that task can be stuck in deploying state forever when the task
> loading/instantiating logic has some issues. Cancelling the job / failover
> caused by failures of other tasks will also get stuck as the cancel watch dog
> won't work for tasks in CREATED/DEPLOYING state at present. We should make
> cancel watch dog cover tasks in DEPLOYING as well (no need for tasks in
> CREATED state has there is no real logic between CREATED->DEPLOYING).
--
This message was sent by Atlassian Jira
(v8.20.10#820010)