zhuzhurk commented on a change in pull request #9663:
[WIP][FLINK-12433][runtime] Implement DefaultScheduler stub
URL: https://github.com/apache/flink/pull/9663#discussion_r326899233
##########
File path:
flink-runtime/src/main/java/org/apache/flink/runtime/scheduler/DefaultScheduler.java
##########
@@ -75,10 +137,281 @@ public DefaultScheduler(
slotRequestTimeout,
shuffleMaster,
partitionTracker);
+
+ this.restartBackoffTimeStrategy = restartBackoffTimeStrategy;
+ this.slotRequestTimeout = slotRequestTimeout;
+ this.slotProvider = slotProvider;
+ this.delayExecutor = delayExecutor;
+ this.userCodeLoader = userCodeLoader;
+ this.schedulingStrategyFactory =
checkNotNull(schedulingStrategyFactory);
+ this.failoverStrategyFactory =
checkNotNull(failoverStrategyFactory);
+ this.executionVertexOperations =
checkNotNull(executionVertexOperations);
+ this.executionVertexVersioner = executionVertexVersioner;
+ this.conditionalFutureHandlerFactory = new
ConditionalFutureHandlerFactory(executionVertexVersioner);
+ }
+
+ //
------------------------------------------------------------------------
+ // SchedulerNG
+ //
------------------------------------------------------------------------
+
+ @Override
+ public void startSchedulingInternal() {
+ initializeScheduling();
+ schedulingStrategy.startScheduling();
+ }
+
+ private void initializeScheduling() {
+ executionFailureHandler = new
ExecutionFailureHandler(failoverStrategyFactory.create(getFailoverTopology()),
restartBackoffTimeStrategy);
+ schedulingStrategy =
schedulingStrategyFactory.createInstance(this, getSchedulingTopology(),
getJobGraph());
+ executionSlotAllocator = new
DefaultExecutionSlotAllocator(slotProvider, getInputsLocationsRetriever(),
slotRequestTimeout);
+ setTaskFailureListener(new
UpdateTaskExecutionStateInDefaultSchedulerListener(this,
getJobGraph().getJobID()));
+ prepareExecutionGraphForScheduling();
+ }
+
+ @Override
+ public boolean updateTaskExecutionState(final TaskExecutionState
taskExecutionState) {
+ final Optional<ExecutionVertexID> executionVertexIdOptional =
getExecutionVertexId(taskExecutionState.getID());
+ if (executionVertexIdOptional.isPresent()) {
+ final ExecutionVertexID executionVertexId =
executionVertexIdOptional.get();
+ updateState(taskExecutionState);
+
schedulingStrategy.onExecutionStateChange(executionVertexId,
taskExecutionState.getExecutionState());
+ maybeHandleTaskFailure(taskExecutionState,
executionVertexId);
Review comment:
I think the potential that `schedulingStrategy.onExecutionStateChange`
changes task state directly in this thread is not good:
* It can cause call stack chain that is hard to imagine, which makes it
hard to maintain
* a very long call stack chain may result in stack overflow
* when we invoke `maybeHandleTaskFailure` right after invoking
`schedulingStrategy.onExecutionStateChange`, the task state may even have
changed in the call stack chain so that we are doing failover handling in an
unexpected state
How about to define that `SchedulerOperations#allocateSlotsAndDeploy` does
not take effect in the direct invoking? And then change the actions in
`allocateSlotsAndDeploy` to be executed in the main thread.
In this way we can have the assumption that no task state change happens
when invoking `schedulingStrategy.onExecutionStateChange` here.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services