StephanEwen commented on a change in pull request #12137:
URL: https://github.com/apache/flink/pull/12137#discussion_r426152842
##########
File path:
flink-runtime/src/main/java/org/apache/flink/runtime/scheduler/DefaultScheduler.java
##########
@@ -186,10 +188,18 @@ private void maybeHandleTaskFailure(final
TaskExecutionState taskExecutionState,
private void handleTaskFailure(final ExecutionVertexID
executionVertexId, @Nullable final Throwable error) {
setGlobalFailureCause(error);
+ notifyCoordinatorsAboutTaskFailure(executionVertexId, error);
Review comment:
I think we may need to remove that check anyways. It is quite possible
that tasks fail immediately after they were deployed, before they could
register.
The `OperatorCoordinator` interface also has no notion of "registering", so
it cannot decide which notifications to forward and which not. I'd rather err
on the "forward too many" side, because missing a notification easily leads to
a stall or data loss (split not added back, result is missing).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]