Zhanghao Chen created FLINK-37024: ------------------------------------- Summary: Task can be stuck in deploying state forever when canceling job/failover Key: FLINK-37024 URL: https://issues.apache.org/jira/browse/FLINK-37024 Project: Flink Issue Type: Bug Components: Runtime / Task Affects Versions: 1.20.0 Reporter: Zhanghao Chen
We observed that task can be stuck in deploying state forever when the task initializing logic has some issues. Cancelling the job / failover caused by failures of other tasks will also get stuck as the cancel watch dog won't work for tasks in CREATED/DEPLOYING state at present. We should make cancel watch dog cover tasks in DEPLOYING as well (no need for tasks in CREATED state has there is no real logic between CREATED->DEPLOYING). -- This message was sent by Atlassian Jira (v8.20.10#820010)