[
https://issues.apache.org/jira/browse/HADOOP-18217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17526555#comment-17526555
]
Steve Loughran commented on HADOOP-18217:
-----------------------------------------
hmm.
i'd thought that code would fail fast if the hook was called while active, but
it looks like System.exit() has its own lock so doesn't get that far.
we cannot have a thread pool for shutdown because during shutdown there's no
guarantee that threads can be created. hence we have that single executor
created in advance.
i like the idea about having the parallel thread to sleep then halt(); it
guarantees that thread won't be calling anything which could itself call
system.exit(). fancy submitting a patch}
note, it wouldn't quite do what we do now, which is to continue to shut down
the other hooks if one times out. historically it's been network/service outage
problems which caused some hook to fail.
> shutdownhookmanager should not be multithreaded (deadlock possible)
> -------------------------------------------------------------------
>
> Key: HADOOP-18217
> URL: https://issues.apache.org/jira/browse/HADOOP-18217
> Project: Hadoop Common
> Issue Type: Bug
> Components: util
> Affects Versions: 2.10.1
> Environment: linux, windows, any version
> Reporter: Catherinot Remi
> Priority: Minor
>
> the ShutdownHookManager class uses an executor to run hooks to have a
> "timeout" notion around them. It does this using a single threaded executor.
> It can leads to deadlock leaving a never-shutting-down JVM with this
> execution flow:
> * JVM need to exit (only daemon threads remaining or someone called
> System.exit)
> * ShutdowHookManager kicks in
> * SHMngr executor start running some hooks
> * SHMngr executor thread kicks in and, as a side effect, run some code from
> one of the hook that calls System.exit (as a side effect from an external lib
> for example)
> * the executor thread is waiting for a lock because another thread already
> entered System.exit and has its internal lock, so the executor never returns.
> * SHMngr never returns
> * 1st call to System.exit never returns
> * JVM stuck
>
> using an executor with a single thread does "fake" timeouts (the task keeps
> running, you can interrupt it but until it stumble upon some piece of code
> that is interruptible (like an IO) it will keep running) especially since the
> executor is a single threaded one. So it has this bug for example :
> * caller submit 1st hook (bad one that would need 1 hour of runtime and that
> cannot be interrupted)
> * executor start 1st hook
> * caller of the future 1st hook result timeout
> * caller submit 2nd hook
> * bug : 1 hook still running, 2nd hook triggers a timeout but never got the
> chance to run anyway, so 1st faulty hook makes it impossible for any other
> hook to have a chance to run, so running hooks in a single separate thread
> does not allow to run other hooks in parallel to long ones.
>
> If we really really want to timeout the JVM shutdown, even accepting maybe
> dirty shutdown, it should rather handle the hooks inside the initial thread
> (not spawning new one(s) so not triggering the deadlock described on the 1st
> place) and if a timeout was configured, only spawn a single parallel daemon
> thread that sleeps the timeout delay, and then use Runtime.halt (which bypass
> the hook system so should not trigger the deadlock). If the normal
> System.exit ends before the timeout delay everything is fine. If the
> System.exit took to much time, the JVM is killed and so the reason why this
> multithreaded shutdown hook implementation was created is satisfied (avoding
> having hanging JVMs)
>
> Had the bug with both oracle and open jdk builds, all in 1.8 major version.
> hadoop 2.6 and 2.7 did not have the issue because they do not run hooks in
> another thread
>
> Another solution is of course to configure the timeout AND to have as many
> threads as needed to run the hooks so to have at least some gain to offset
> the pain of the dealock scenario
--
This message was sent by Atlassian Jira
(v8.20.7#820007)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]