Tartarus0zm commented on a change in pull request #13319:
URL: https://github.com/apache/flink/pull/13319#discussion_r490285294
##########
File path:
flink-runtime/src/main/java/org/apache/flink/runtime/entrypoint/component/DispatcherResourceManagerComponent.java
##########
@@ -74,22 +79,46 @@
@Nonnull ResourceManager<?> resourceManager,
@Nonnull LeaderRetrievalService
dispatcherLeaderRetrievalService,
@Nonnull LeaderRetrievalService
resourceManagerRetrievalService,
- @Nonnull WebMonitorEndpoint<?> webMonitorEndpoint) {
+ @Nonnull WebMonitorEndpoint<?> webMonitorEndpoint,
+ @Nonnull FatalErrorHandler fatalErrorHandler,
+ @Nonnull CompletableFuture<DispatcherGateway>
dispatcherGatewayCompletableFuture) {
this.dispatcherRunner = dispatcherRunner;
this.resourceManager = resourceManager;
this.dispatcherLeaderRetrievalService =
dispatcherLeaderRetrievalService;
this.resourceManagerRetrievalService =
resourceManagerRetrievalService;
this.webMonitorEndpoint = webMonitorEndpoint;
+ this.fatalErrorHandler = fatalErrorHandler;
this.terminationFuture = new CompletableFuture<>();
this.shutDownFuture = new CompletableFuture<>();
registerShutDownFuture();
+ failOnPrematureTermination(dispatcherGatewayCompletableFuture);
}
private void registerShutDownFuture() {
FutureUtils.forward(dispatcherRunner.getShutDownFuture(),
shutDownFuture);
}
+ private void
failOnPrematureTermination(CompletableFuture<DispatcherGateway>
dispatcherGatewayCompletableFuture) {
+ dispatcherGatewayCompletableFuture.whenComplete((dispatcher,
throwable) -> {
+ if (dispatcher != null && dispatcher instanceof
Dispatcher) {
Review comment:
if we use the shutdown future of the DispatcherRunner like this
```
private void failOnPrematureTermination() {
CompletableFuture.anyOf(dispatcherRunner.getShutDownFuture(),
resourceManager.getTerminationFuture())
.whenComplete((ignored, t) -> {
if (t == null) {
LOG.error("Dispatcher/ResourceManager shut down
because something unexpected happen!");
}
if (isRunning.get()) {
LOG.warn("DispatcherResourceManagerComponent
shut down because the Dispatcher/ResourceManager unexpected terminated.");
fatalErrorHandler.onFatalError(t);
}
});
}
```
in per-job mode, `ClusterEntrypoint.shutDownAsync()` may not have a chance
to execute, because `ClusterEntrypoint.onFatalError` will call
`System.exit(RUNTIME_FAILURE_RETURN_CODE);`
Do you have any good suggestions for revision?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]