[
https://issues.apache.org/jira/browse/HADOOP-18217?focusedWorklogId=769949&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-769949
]
ASF GitHub Bot logged work on HADOOP-18217:
-------------------------------------------
Author: ASF GitHub Bot
Created on: 12/May/22 22:57
Start Date: 12/May/22 22:57
Worklog Time Spent: 10m
Work Description: hadoop-yetus commented on PR #4255:
URL: https://github.com/apache/hadoop/pull/4255#issuecomment-1125491011
:broken_heart: **-1 overall**
| Vote | Subsystem | Runtime | Logfile | Comment |
|:----:|----------:|--------:|:--------:|:-------:|
| +0 :ok: | reexec | 0m 53s | | Docker mode activated. |
|||| _ Prechecks _ |
| +1 :green_heart: | dupname | 0m 0s | | No case conflicting files
found. |
| +0 :ok: | codespell | 0m 0s | | codespell was not available. |
| +1 :green_heart: | @author | 0m 0s | | The patch does not contain
any @author tags. |
| +1 :green_heart: | test4tests | 0m 0s | | The patch appears to
include 1 new or modified test files. |
|||| _ trunk Compile Tests _ |
| +1 :green_heart: | mvninstall | 42m 52s | | trunk passed |
| +1 :green_heart: | compile | 25m 1s | | trunk passed with JDK
Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1 |
| +1 :green_heart: | compile | 21m 31s | | trunk passed with JDK
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 |
| +1 :green_heart: | checkstyle | 1m 30s | | trunk passed |
| +1 :green_heart: | mvnsite | 1m 57s | | trunk passed |
| -1 :x: | javadoc | 1m 34s |
[/branch-javadoc-hadoop-common-project_hadoop-common-jdkPrivateBuild-11.0.15+10-Ubuntu-0ubuntu0.20.04.1.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4255/2/artifact/out/branch-javadoc-hadoop-common-project_hadoop-common-jdkPrivateBuild-11.0.15+10-Ubuntu-0ubuntu0.20.04.1.txt)
| hadoop-common in trunk failed with JDK Private
Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1. |
| +1 :green_heart: | javadoc | 1m 54s | | trunk passed with JDK
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 |
| +1 :green_heart: | spotbugs | 3m 4s | | trunk passed |
| +1 :green_heart: | shadedclient | 25m 48s | | branch has no errors
when building and testing our client artifacts. |
|||| _ Patch Compile Tests _ |
| +1 :green_heart: | mvninstall | 1m 8s | | the patch passed |
| +1 :green_heart: | compile | 24m 2s | | the patch passed with JDK
Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1 |
| +1 :green_heart: | javac | 24m 2s | | the patch passed |
| +1 :green_heart: | compile | 21m 41s | | the patch passed with JDK
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 |
| +1 :green_heart: | javac | 21m 41s | | the patch passed |
| +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks
issues. |
| +1 :green_heart: | checkstyle | 1m 24s | | the patch passed |
| +1 :green_heart: | mvnsite | 1m 55s | | the patch passed |
| -1 :x: | javadoc | 1m 28s |
[/patch-javadoc-hadoop-common-project_hadoop-common-jdkPrivateBuild-11.0.15+10-Ubuntu-0ubuntu0.20.04.1.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4255/2/artifact/out/patch-javadoc-hadoop-common-project_hadoop-common-jdkPrivateBuild-11.0.15+10-Ubuntu-0ubuntu0.20.04.1.txt)
| hadoop-common in the patch failed with JDK Private
Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1. |
| +1 :green_heart: | javadoc | 2m 1s | | the patch passed with JDK
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 |
| +1 :green_heart: | spotbugs | 3m 5s | | the patch passed |
| +1 :green_heart: | shadedclient | 25m 46s | | patch has no errors
when building and testing our client artifacts. |
|||| _ Other Tests _ |
| +1 :green_heart: | unit | 18m 10s | | hadoop-common in the patch
passed. |
| +1 :green_heart: | asflicense | 1m 16s | | The patch does not
generate ASF License warnings. |
| | | 228m 29s | | |
| Subsystem | Report/Notes |
|----------:|:-------------|
| Docker | ClientAPI=1.41 ServerAPI=1.41 base:
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4255/2/artifact/out/Dockerfile
|
| GITHUB PR | https://github.com/apache/hadoop/pull/4255 |
| Optional Tests | dupname asflicense compile javac javadoc mvninstall
mvnsite unit shadedclient spotbugs checkstyle codespell |
| uname | Linux 21a14d2e3f09 4.15.0-175-generic #184-Ubuntu SMP Thu Mar 24
17:48:36 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | dev-support/bin/hadoop.sh |
| git revision | trunk / f279d644ee91e335c63cd090baf0635edc325655 |
| Default Java | Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 |
| Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Private
Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1
/usr/lib/jvm/java-8-openjdk-amd64:Private
Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 |
| Test Results |
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4255/2/testReport/ |
| Max. process+thread count | 3135 (vs. ulimit of 5500) |
| modules | C: hadoop-common-project/hadoop-common U:
hadoop-common-project/hadoop-common |
| Console output |
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4255/2/console |
| versions | git=2.25.1 maven=3.6.3 spotbugs=4.2.2 |
| Powered by | Apache Yetus 0.14.0-SNAPSHOT https://yetus.apache.org |
This message was automatically generated.
Issue Time Tracking
-------------------
Worklog Id: (was: 769949)
Time Spent: 1h 10m (was: 1h)
> shutdownhookmanager should not be multithreaded (deadlock possible)
> -------------------------------------------------------------------
>
> Key: HADOOP-18217
> URL: https://issues.apache.org/jira/browse/HADOOP-18217
> Project: Hadoop Common
> Issue Type: Bug
> Components: util
> Affects Versions: 2.10.1
> Environment: linux, windows, any version
> Reporter: Catherinot Remi
> Priority: Minor
> Labels: pull-request-available
> Attachments: wtf.java
>
> Time Spent: 1h 10m
> Remaining Estimate: 0h
>
> the ShutdownHookManager class uses an executor to run hooks to have a
> "timeout" notion around them. It does this using a single threaded executor.
> It can leads to deadlock leaving a never-shutting-down JVM with this
> execution flow:
> * JVM need to exit (only daemon threads remaining or someone called
> System.exit)
> * ShutdowHookManager kicks in
> * SHMngr executor start running some hooks
> * SHMngr executor thread kicks in and, as a side effect, run some code from
> one of the hook that calls System.exit (as a side effect from an external lib
> for example)
> * the executor thread is waiting for a lock because another thread already
> entered System.exit and has its internal lock, so the executor never returns.
> * SHMngr never returns
> * 1st call to System.exit never returns
> * JVM stuck
>
> using an executor with a single thread does "fake" timeouts (the task keeps
> running, you can interrupt it but until it stumble upon some piece of code
> that is interruptible (like an IO) it will keep running) especially since the
> executor is a single threaded one. So it has this bug for example :
> * caller submit 1st hook (bad one that would need 1 hour of runtime and that
> cannot be interrupted)
> * executor start 1st hook
> * caller of the future 1st hook result timeout
> * caller submit 2nd hook
> * bug : 1 hook still running, 2nd hook triggers a timeout but never got the
> chance to run anyway, so 1st faulty hook makes it impossible for any other
> hook to have a chance to run, so running hooks in a single separate thread
> does not allow to run other hooks in parallel to long ones.
>
> If we really really want to timeout the JVM shutdown, even accepting maybe
> dirty shutdown, it should rather handle the hooks inside the initial thread
> (not spawning new one(s) so not triggering the deadlock described on the 1st
> place) and if a timeout was configured, only spawn a single parallel daemon
> thread that sleeps the timeout delay, and then use Runtime.halt (which bypass
> the hook system so should not trigger the deadlock). If the normal
> System.exit ends before the timeout delay everything is fine. If the
> System.exit took to much time, the JVM is killed and so the reason why this
> multithreaded shutdown hook implementation was created is satisfied (avoding
> having hanging JVMs)
>
> Had the bug with both oracle and open jdk builds, all in 1.8 major version.
> hadoop 2.6 and 2.7 did not have the issue because they do not run hooks in
> another thread
>
> Another solution is of course to configure the timeout AND to have as many
> threads as needed to run the hooks so to have at least some gain to offset
> the pain of the dealock scenario
>
> EDIT: added some logs and reproduced the problem. in fact it is located after
> triggering all the hook entries and before shutting down the executor.
> Current code, after running the hooks, creates a new Configuration object and
> reads the configured timeout from it, applies this timeout to shutdown the
> executor. I sometimes run with a classloader doing remote classloading,
> Configuration loads its content using this classloader, so when shutting down
> the JVM and some network error occurs the classloader fails to load the
> ressources needed by Configuration. So the code crash before shutting down
> the executor and ends up inside the thread's default uncaught throwable
> handler, which was calling System.exit, so got stuck, so shutting down the
> executor never returned, so does the JVM.
> So, forget about the halt stuff (even if it is a last ressort very robust
> safety net). Still I'll do a small adjustement to the final executor shutdown
> code to be slightly more robust to even the strangest exceptions/errors it
> encounters.
--
This message was sent by Atlassian Jira
(v8.20.7#820007)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]