[ 
https://issues.apache.org/jira/browse/HADOOP-13709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15573286#comment-15573286
 ] 

Eric Badger commented on HADOOP-13709:
--------------------------------------

[~daryn], good catch! I was able to recreate the hung test locally and took a 
jstack. Looks like it is indeed deadlocking between threads reading the stdout 
stream. 

I'm not super thrilled about using a shutdown hook to clean up the processes, 
but in this case I think it will work if we can't think of anything else. 
[~jlowe], do you have any additional insights? 

{noformat}
Found one Java-level deadlock:
=============================
"pool-4-thread-1":
  waiting to lock monitor 0x00007f5900002d78 (object 0x00000000d67243a0, a 
java.lang.UNIXProcess$ProcessPipeInputStream),
  which is held by "Thread-0"
"Thread-0":
  waiting to lock monitor 0x00007f58f8006008 (object 0x00000000d672ec28, a 
java.io.InputStreamReader),
  which is held by "pool-4-thread-1"

Java stack information for the threads listed above:
===================================================
"pool-4-thread-1":
        at java.io.BufferedInputStream.read(BufferedInputStream.java:336)
        - waiting to lock <0x00000000d67243a0> (a 
java.lang.UNIXProcess$ProcessPipeInputStream)
        at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:284)
        at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:326)
        at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178)
        - locked <0x00000000d672ec28> (a java.io.InputStreamReader)
        at java.io.InputStreamReader.read(InputStreamReader.java:184)
        at java.io.BufferedReader.fill(BufferedReader.java:161)
        at java.io.BufferedReader.read1(BufferedReader.java:212)
        at java.io.BufferedReader.read(BufferedReader.java:286)
        - locked <0x00000000d672ec28> (a java.io.InputStreamReader)
        at 
org.apache.hadoop.util.Shell$ShellCommandExecutor.parseExecResult(Shell.java:1214)
        at org.apache.hadoop.util.Shell$2.call(Shell.java:965)
        at org.apache.hadoop.util.Shell$2.call(Shell.java:962)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
"Thread-0":
        at java.io.BufferedReader.close(BufferedReader.java:522)
        - waiting to lock <0x00000000d672ec28> (a java.io.InputStreamReader)
        at org.apache.hadoop.util.Shell.runCommand(Shell.java:1029)
        - locked <0x00000000d67243a0> (a 
java.lang.UNIXProcess$ProcessPipeInputStream)
        at org.apache.hadoop.util.Shell.run(Shell.java:883)
        at 
org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1201)
        at 
org.apache.hadoop.util.TestShell.testShellCommandTimerLeak(TestShell.java:241)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:497)
        at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
        at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
        at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
        at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
        at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
        at 
org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74)

Found 1 deadlock.
{noformat}

> Clean up subprocesses spawned by Shell.java:runCommand when the shell process 
> exits
> -----------------------------------------------------------------------------------
>
>                 Key: HADOOP-13709
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13709
>             Project: Hadoop Common
>          Issue Type: Bug
>    Affects Versions: 2.2.0
>            Reporter: Eric Badger
>            Assignee: Eric Badger
>         Attachments: HADOOP-13709.001.patch
>
>
> The runCommand code in Shell.java can get into a situation where it will 
> ignore InterruptedExceptions and refuse to shutdown due to being in I/O 
> waiting for the return value of the subprocess that was spawned. We need to 
> allow for the subprocess to be interrupted and killed when the shell process 
> gets killed. Currently the JVM will shutdown and all of the subprocesses will 
> be orphaned and not killed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to