[ 
https://issues.apache.org/jira/browse/HADOOP-15538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16513989#comment-16513989
 ] 

Daryn Sharp commented on HADOOP-15538:
--------------------------------------

The IPC client and parameter sending threads should be the only threads using 
the socket channel.  Note there's no other parameter sending threads other than 
the one reported in the deadlock.  This implies a dead thread is holding the 
monitor which would be an insidious jvm bug.

Here's what produces the UNKNOWN_owner_addr output.  Note the comment "blocked 
permanently".
{code:java}
      currentThread = Threads::owning_thread_from_monitor_owner(
                        (address)waitingToLockMonitor->owner(),
                        false /* no locking needed */);
      if (currentThread == NULL) {
        // The deadlock was detected at a safepoint so the JavaThread
        // that owns waitingToLockMonitor should be findable, but
        // if it is not findable, then the previous currentThread is
        // blocked permanently.
        st->print("%s UNKNOWN_owner_addr=" PTR_FORMAT, owner_desc,
                  (address)waitingToLockMonitor->owner());
        continue;
      }
{code}

Are there any log lines indicating an OOM or other serious problem occurred?  
Maybe native code somehow killed the thread directly or by mangling memory?

Next time it happens, run pstack to check for 
{{SharedRuntime::handle_wrong_method}}.  There is a (supposedly) fixed jvm bug 
that causes a process to go into an infinite loop which eats 100% cpu.  That 
said, was the process idle or consuming cpu?

> Possible RPC deadlock in Client
> -------------------------------
>
>                 Key: HADOOP-15538
>                 URL: https://issues.apache.org/jira/browse/HADOOP-15538
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: common
>            Reporter: Yongjun Zhang
>            Priority: Major
>         Attachments: t1+13min.jstack, t1.jstack
>
>
> We have a jstack collection that spans 13 minutes. One frame per ~1.5 
> minutes. And for each of the frame, I observed the following:
> {code:java}
> Found one Java-level deadlock:
> =============================
> "IPC Parameter Sending Thread #294":
>   waiting to lock monitor 0x00007f68f21f3188 (object 0x0000000621745390, a 
> java.lang.Object),
>   which is held by UNKNOWN_owner_addr=0x00007f68332e2800
> Java stack information for the threads listed above:
> ===================================================
> "IPC Parameter Sending Thread #294":
>         at 
> sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:268)
>         - waiting to lock <0x0000000621745390> (a java.lang.Object)
>         at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:461)
>         - locked <0x0000000621745380> (a java.lang.Object)
>         at 
> org.apache.hadoop.net.SocketOutputStream$Writer.performIO(SocketOutputStream.java:63)
>         at 
> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
>         at 
> org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:159)
>         at 
> org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:117)
>         at 
> java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
>         at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140)
>         - locked <0x0000000621749850> (a java.io.BufferedOutputStream)
>         at java.io.DataOutputStream.flush(DataOutputStream.java:123)
>         at org.apache.hadoop.ipc.Client$Connection$3.run(Client.java:1072)
>         - locked <0x000000062174b878> (a java.io.DataOutputStream)
>         at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>         at java.lang.Thread.run(Thread.java:748)
> Found one Java-level deadlock:
> =============================
> "IPC Client (297602875) connection to x.y.z.p:8020 from impala":
>   waiting to lock monitor 0x00007f68f21f3188 (object 0x0000000621745390, a 
> java.lang.Object),
>   which is held by UNKNOWN_owner_addr=0x00007f68332e2800
> Java stack information for the threads listed above:
> ===================================================
> "IPC Client (297602875) connection to x.y.z.p:8020 from impala":
>         at 
> sun.nio.ch.SocketChannelImpl.readerCleanup(SocketChannelImpl.java:279)
>         - waiting to lock <0x0000000621745390> (a java.lang.Object)
>         at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:390)
>         - locked <0x0000000621745370> (a java.lang.Object)
>         at 
> org.apache.hadoop.net.SocketInputStream$Reader.performIO(SocketInputStream.java:57)
>         at 
> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
>         at 
> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161)
>         at 
> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131)
>         at java.io.FilterInputStream.read(FilterInputStream.java:133)
>         at java.io.FilterInputStream.read(FilterInputStream.java:133)
>         at 
> org.apache.hadoop.ipc.Client$Connection$PingInputStream.read(Client.java:553)
>         at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
>         at java.io.BufferedInputStream.read(BufferedInputStream.java:265)
>         - locked <0x00000006217476f0> (a java.io.BufferedInputStream)
>         at java.io.DataInputStream.readInt(DataInputStream.java:387)
>         at 
> org.apache.hadoop.ipc.Client$Connection.receiveRpcResponse(Client.java:1113)
>         at org.apache.hadoop.ipc.Client$Connection.run(Client.java:1006)
> Found 2 deadlocks.
> {code}
> This happens with jdk1.8.0_162 on 2.6.32-696.18.7.el6.x86_64.
> The code appears to match 
> [https://insight.io/github.com/AdoptOpenJDK/openjdk-jdk8u/tree/dev/jdk/src/share/classes/sun/nio/ch/SocketChannelImpl.java].
> The first thread is blocked at:
> [https://insight.io/github.com/AdoptOpenJDK/openjdk-jdk8u/blob/dev/jdk/src/share/classes/sun/nio/ch/SocketChannelImpl.java?line=268]
> The second thread is blocked at:
>  
> [https://insight.io/github.com/AdoptOpenJDK/openjdk-jdk8u/blob/dev/jdk/src/share/classes/sun/nio/ch/SocketChannelImpl.java?line=279]
> There are two issues here:
>  # There seems to be a real deadlock because the stacks remain the same even 
> if the first an last jstack frames captured is 13 minutes apart.
>  # Java deadlock report seems to be problematic, two threads that have 
> deadlock should not be blocked on the same lock, but they appear to be in 
> this case: the same SocketChannelImpl's stateLock.
> I found a relevant jdk jira 
> [https://bugs.openjdk.java.net/browse/JDK-8007476], it explains where two 
> deadlocks are reported and they are really for the same deadlock.
> I don't see a similar report about this issue in jdk jira database, and I'm 
> thinking about filing a jdk jira for that, but would like to throw some 
> discussion here before that.
> Issue#1 is important, because the client is hanging, which indicate a real 
> problem; however, without a correct report before issue#2 is fixed, it's not 
> clear how the deadlock really looks like.
> Thanks.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to