zhuzhurk commented on a change in pull request #9791: [FLINK-14248][runtime] 
Let LazyFromSourcesSchedulingStrategy restart terminated tasks
URL: https://github.com/apache/flink/pull/9791#discussion_r330592182
 
 

 ##########
 File path: 
flink-runtime/src/main/java/org/apache/flink/runtime/scheduler/strategy/LazyFromSourcesSchedulingStrategy.java
 ##########
 @@ -78,19 +81,22 @@ public void startScheduling() {
                        deploymentOptions.put(schedulingVertex.getId(), option);
                }
 
-               
allocateSlotsAndDeployExecutionVertexIds(getAllVerticesFromTopology());
+               allocateSlotsAndDeployExecutionVertices(
+                       
getSchedulingExecutionVertices(getAllVerticesFromTopology()),
+                       IS_IN_CREATED_STATE);
        }
 
        @Override
-       public void restartTasks(Set<ExecutionVertexID> verticesToRestart) {
+       public void restartTasks(final Set<ExecutionVertexID> 
verticesToRestart) {
+               final Set<SchedulingExecutionVertex> verticesToSchedule = 
getSchedulingExecutionVertices(verticesToRestart);
+
                // increase counter of the dataset first
-               verticesToRestart
+               verticesToSchedule
                        .stream()
-                       .map(schedulingTopology::getVertexOrThrow)
                        .flatMap(vertex -> 
vertex.getProducedResultPartitions().stream())
                        
.forEach(inputConstraintChecker::resetSchedulingResultPartition);
 
-               allocateSlotsAndDeployExecutionVertexIds(verticesToRestart);
+               allocateSlotsAndDeployExecutionVertices(verticesToSchedule, 
IS_IN_TERMINAL_STATE);
 
 Review comment:
   It just came to me that it can be problematic if we keep the lazy resetting.
   e.g. The job A1--pipelined-->B1 fails. And only A1 will be re-scheduled on 
`restartTasks()` since the inputs of B1 are not ready. B1 should be scheduled 
later on the partition consumable event from restarted A1. But the terminal 
state of B1 will prevent B1 from being scheduled.
   
   Seems we lack a test for this case.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

Reply via email to