[
https://issues.apache.org/jira/browse/FLINK-29330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17606539#comment-17606539
]
Matthias Pohl commented on FLINK-29330:
---------------------------------------
I guess, it's a good idea. The shutdown logic is hard to understand in general.
Having some consistent log output will definitely help identifying the root
cause.
When going through the shutdown process ([sequence
diagram|https://gist.github.com/XComp/ad221f2ef33e99ac7b9b03113889b918]) I
noticed that we do not use {{AutoClosableAsync}} in all the cases where it
would be useful (more specifically, {{LeaderRetrievalService}}). This might be
a followup aligning the shutdown process in this sense.
> Provide better logs of MiniCluster shutdown procedure
> -----------------------------------------------------
>
> Key: FLINK-29330
> URL: https://issues.apache.org/jira/browse/FLINK-29330
> Project: Flink
> Issue Type: Technical Debt
> Components: Tests
> Reporter: Chesnay Schepler
> Assignee: Chesnay Schepler
> Priority: Major
> Fix For: 1.17.0
>
>
> I recently ran into an issue where the shutdown of a MiniCluster timed out.
> The logs weren't helpful at all and I had to go in and check every
> asynchronously component for whether _that_ component was the cause.
> The main issues were that various components don't log anything at all, or
> that when they did it wasn't clear who owned that component.
> I'd like to add a util that makes it easier for us log the start/stop of a
> shutdown procedure,
> {code:java}
> public class ShutdownLog {
> /**
> * Logs the beginning and end of the shutdown procedure for the given
> component.
> *
> * <p>This method accepts a {@link Supplier} instead of a {@link
> CompletableFuture} because the
> * latter usually required implies the shutdown to already have begun.
> *
> * @param log Logger of owning component
> * @param component component that will be shut down
> * @param shutdownTrigger component shutdown trigger
> * @return termination future of the component
> */
> public static <C> CompletableFuture<C> logShutdown(
> Logger log, String component, Supplier<CompletableFuture<C>>
> shutdownTrigger) {
> log.debug("Starting shutdown of {}.", component);
> return FutureUtils.logCompletion(log, "shutdown of " + component,
> shutdownTrigger.get());
> }
> }
> public class FutureUtils {
> public static <T> CompletableFuture<T> logCompletion(
> Logger log, String action, CompletableFuture<T> future) {
> future.handle(
> (t, throwable) -> {
> if (throwable == null) {
> log.debug("Completed {}.", action);
> } else {
> log.debug("Failed {}.", action, throwable);
> }
> return null;
> });
> return future;
> }
> ...
> {code}
> and extend the AutoCloseableAsync interface for an easy opt-in and customized
> logging:
> {code:java}
> default CompletableFuture<Void> closeAsync(Logger log) {
> return ShutdownLog.logShutdown(log, getClass().getSimpleName(),
> this::closeAsync);
> }
> {code}
> MiniCluster example usages:
> {code:java}
> -terminationFutures.add(dispatcherResourceManagerComponent.closeAsync())
> +terminationFutures.add(dispatcherResourceManagerComponent.closeAsync(LOG))
> {code}
> {code:java}
> -return ExecutorUtils.nonBlockingShutdown(
> - executorShutdownTimeoutMillis, TimeUnit.MILLISECONDS, ioExecutor);
> +return ShutdownLog.logShutdown(
> + LOG,
> + "ioExecutor",
> + () ->
> + ExecutorUtils.nonBlockingShutdown(
> + executorShutdownTimeoutMillis,
> + TimeUnit.MILLISECONDS,
> + ioExecutor));
> {code}
> [~mapohl] I'm interested what you think about this.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)