[
https://issues.apache.org/jira/browse/HADOOP-15538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16514004#comment-16514004
]
Daryn Sharp commented on HADOOP-15538:
--------------------------------------
Just noticed a possible clue:
{noformat}
"main" #1 prio=5 os_prio=0 tid=0x0000000004cd8000 nid=0xf1491 runnable
[0x0000000000000000]
java.lang.Thread.State: RUNNABLE
"VM Thread" os_prio=0 tid=0x0000000004f39000 nid=0xf163e runnable
{noformat}
The main thread has no stack trace. Did you clip it out? If no, that's
indicative of the jvm trying to shutdown. It may have detected a thread death
or other nasty issue but is stuck waiting for all the non-daemon threads to
exit. Look for a hs_err_pid file.
> Possible RPC deadlock in Client
> -------------------------------
>
> Key: HADOOP-15538
> URL: https://issues.apache.org/jira/browse/HADOOP-15538
> Project: Hadoop Common
> Issue Type: Bug
> Components: common
> Reporter: Yongjun Zhang
> Priority: Major
> Attachments: t1+13min.jstack, t1.jstack
>
>
> We have a jstack collection that spans 13 minutes. One frame per ~1.5
> minutes. And for each of the frame, I observed the following:
> {code:java}
> Found one Java-level deadlock:
> =============================
> "IPC Parameter Sending Thread #294":
> waiting to lock monitor 0x00007f68f21f3188 (object 0x0000000621745390, a
> java.lang.Object),
> which is held by UNKNOWN_owner_addr=0x00007f68332e2800
> Java stack information for the threads listed above:
> ===================================================
> "IPC Parameter Sending Thread #294":
> at
> sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:268)
> - waiting to lock <0x0000000621745390> (a java.lang.Object)
> at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:461)
> - locked <0x0000000621745380> (a java.lang.Object)
> at
> org.apache.hadoop.net.SocketOutputStream$Writer.performIO(SocketOutputStream.java:63)
> at
> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
> at
> org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:159)
> at
> org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:117)
> at
> java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
> at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140)
> - locked <0x0000000621749850> (a java.io.BufferedOutputStream)
> at java.io.DataOutputStream.flush(DataOutputStream.java:123)
> at org.apache.hadoop.ipc.Client$Connection$3.run(Client.java:1072)
> - locked <0x000000062174b878> (a java.io.DataOutputStream)
> at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> Found one Java-level deadlock:
> =============================
> "IPC Client (297602875) connection to x.y.z.p:8020 from impala":
> waiting to lock monitor 0x00007f68f21f3188 (object 0x0000000621745390, a
> java.lang.Object),
> which is held by UNKNOWN_owner_addr=0x00007f68332e2800
> Java stack information for the threads listed above:
> ===================================================
> "IPC Client (297602875) connection to x.y.z.p:8020 from impala":
> at
> sun.nio.ch.SocketChannelImpl.readerCleanup(SocketChannelImpl.java:279)
> - waiting to lock <0x0000000621745390> (a java.lang.Object)
> at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:390)
> - locked <0x0000000621745370> (a java.lang.Object)
> at
> org.apache.hadoop.net.SocketInputStream$Reader.performIO(SocketInputStream.java:57)
> at
> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
> at
> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161)
> at
> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131)
> at java.io.FilterInputStream.read(FilterInputStream.java:133)
> at java.io.FilterInputStream.read(FilterInputStream.java:133)
> at
> org.apache.hadoop.ipc.Client$Connection$PingInputStream.read(Client.java:553)
> at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
> at java.io.BufferedInputStream.read(BufferedInputStream.java:265)
> - locked <0x00000006217476f0> (a java.io.BufferedInputStream)
> at java.io.DataInputStream.readInt(DataInputStream.java:387)
> at
> org.apache.hadoop.ipc.Client$Connection.receiveRpcResponse(Client.java:1113)
> at org.apache.hadoop.ipc.Client$Connection.run(Client.java:1006)
> Found 2 deadlocks.
> {code}
> This happens with jdk1.8.0_162 on 2.6.32-696.18.7.el6.x86_64.
> The code appears to match
> [https://insight.io/github.com/AdoptOpenJDK/openjdk-jdk8u/tree/dev/jdk/src/share/classes/sun/nio/ch/SocketChannelImpl.java].
> The first thread is blocked at:
> [https://insight.io/github.com/AdoptOpenJDK/openjdk-jdk8u/blob/dev/jdk/src/share/classes/sun/nio/ch/SocketChannelImpl.java?line=268]
> The second thread is blocked at:
>
> [https://insight.io/github.com/AdoptOpenJDK/openjdk-jdk8u/blob/dev/jdk/src/share/classes/sun/nio/ch/SocketChannelImpl.java?line=279]
> There are two issues here:
> # There seems to be a real deadlock because the stacks remain the same even
> if the first an last jstack frames captured is 13 minutes apart.
> # Java deadlock report seems to be problematic, two threads that have
> deadlock should not be blocked on the same lock, but they appear to be in
> this case: the same SocketChannelImpl's stateLock.
> I found a relevant jdk jira
> [https://bugs.openjdk.java.net/browse/JDK-8007476], it explains where two
> deadlocks are reported and they are really for the same deadlock.
> I don't see a similar report about this issue in jdk jira database, and I'm
> thinking about filing a jdk jira for that, but would like to throw some
> discussion here before that.
> Issue#1 is important, because the client is hanging, which indicate a real
> problem; however, without a correct report before issue#2 is fixed, it's not
> clear how the deadlock really looks like.
> Thanks.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]