Zhu Zhu created FLINK-14331:
-------------------------------
Summary: Reset vertices right after they transition to terminated
states
Key: FLINK-14331
URL: https://issues.apache.org/jira/browse/FLINK-14331
Project: Flink
Issue Type: Sub-task
Components: Runtime / Coordination
Affects Versions: 1.10.0
Reporter: Zhu Zhu
Fix For: 1.10.0
Currently in DefaultScheduler, tasks to restart will remain in terminated state
until they are re-scheduled by the SchedulingStrategy.
This behavior may cause 2 problems:
1. Failed/Canceled tasks are possibly not be able to be restarted in lazy
scheduling. e.g. The job A1--pipelined-->B1 fails. And only A1 will be
re-scheduled on restartTasks() since the inputs of B1 are not ready. B1 should
be scheduled later on the partition consumable event from restarted A1. But the
terminal state of B1 will prevent B1 from being scheduled.
2. Keeping a task in FAILED/CANCELED state for a long time can happen if it
takes a long time for its inputs to become ready again. This is also not
friendly to users, which may cause confusions.
That's why I'd propose to reset vertices right after they transition to
terminated states.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)