[ 
https://issues.apache.org/jira/browse/FLINK-14331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated FLINK-14331:
-----------------------------------
    Labels: pull-request-available  (was: )

> Reset vertices right after they transition to terminated states
> ---------------------------------------------------------------
>
>                 Key: FLINK-14331
>                 URL: https://issues.apache.org/jira/browse/FLINK-14331
>             Project: Flink
>          Issue Type: Sub-task
>          Components: Runtime / Coordination
>    Affects Versions: 1.10.0
>            Reporter: Zhu Zhu
>            Assignee: Zhu Zhu
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 1.10.0
>
>
> Currently in DefaultScheduler, tasks to restart will remain in terminated 
> state until they are re-scheduled by the SchedulingStrategy.
> This behavior may cause 2 problems:
> 1. Failed/Canceled tasks are possibly not be able to be restarted in lazy 
> scheduling. e.g. The job A1--pipelined-->B1 fails. And only A1 will be 
> re-scheduled on restartTasks() since the inputs of B1 are not ready. B1 
> should be scheduled later on the partition consumable event from restarted 
> A1. But the terminal state of B1 will prevent B1 from being scheduled.
> 2. Keeping a task in FAILED/CANCELED state for a long time can happen if it 
> takes a long time for its inputs to become ready again. This is also not 
> friendly to users, which may cause confusions.
> That's why I'd propose to reset vertices right after they transition to 
> terminated states.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to