zhuzhurk commented on a change in pull request #9663: [WIP][FLINK-12433][runtime] Implement DefaultScheduler stub URL: https://github.com/apache/flink/pull/9663#discussion_r326899233
########## File path: flink-runtime/src/main/java/org/apache/flink/runtime/scheduler/DefaultScheduler.java ########## @@ -75,10 +137,281 @@ public DefaultScheduler( slotRequestTimeout, shuffleMaster, partitionTracker); + + this.restartBackoffTimeStrategy = restartBackoffTimeStrategy; + this.slotRequestTimeout = slotRequestTimeout; + this.slotProvider = slotProvider; + this.delayExecutor = delayExecutor; + this.userCodeLoader = userCodeLoader; + this.schedulingStrategyFactory = checkNotNull(schedulingStrategyFactory); + this.failoverStrategyFactory = checkNotNull(failoverStrategyFactory); + this.executionVertexOperations = checkNotNull(executionVertexOperations); + this.executionVertexVersioner = executionVertexVersioner; + this.conditionalFutureHandlerFactory = new ConditionalFutureHandlerFactory(executionVertexVersioner); + } + + // ------------------------------------------------------------------------ + // SchedulerNG + // ------------------------------------------------------------------------ + + @Override + public void startSchedulingInternal() { + initializeScheduling(); + schedulingStrategy.startScheduling(); + } + + private void initializeScheduling() { + executionFailureHandler = new ExecutionFailureHandler(failoverStrategyFactory.create(getFailoverTopology()), restartBackoffTimeStrategy); + schedulingStrategy = schedulingStrategyFactory.createInstance(this, getSchedulingTopology(), getJobGraph()); + executionSlotAllocator = new DefaultExecutionSlotAllocator(slotProvider, getInputsLocationsRetriever(), slotRequestTimeout); + setTaskFailureListener(new UpdateTaskExecutionStateInDefaultSchedulerListener(this, getJobGraph().getJobID())); + prepareExecutionGraphForScheduling(); + } + + @Override + public boolean updateTaskExecutionState(final TaskExecutionState taskExecutionState) { + final Optional<ExecutionVertexID> executionVertexIdOptional = getExecutionVertexId(taskExecutionState.getID()); + if (executionVertexIdOptional.isPresent()) { + final ExecutionVertexID executionVertexId = executionVertexIdOptional.get(); + updateState(taskExecutionState); + schedulingStrategy.onExecutionStateChange(executionVertexId, taskExecutionState.getExecutionState()); + maybeHandleTaskFailure(taskExecutionState, executionVertexId); Review comment: I think the potential that `schedulingStrategy.onExecutionStateChange` changes task state directly in this thread is not good: * It can cause call stack chain that is hard to imagine, which makes it hard to maintain * a very long call stack chain may result in stack overflow * when we invoke `maybeHandleTaskFailure` right after invoking `schedulingStrategy.onExecutionStateChange`, the task state may even have changed in the call stack chain so that we are doing failover handling in an unexpected state How about to define that `SchedulerOperations#allocateSlotsAndDeploy` does not take effect in the direct invoking? And then change the actions in `allocateSlotsAndDeploy` to be executed in the main thread. In this way we can have the assumption that no task state change happens when invoking `schedulingStrategy.onExecutionStateChange` here. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services