On Oct 11, 2017, at 3:05 AM, Todd Lipcon <[email protected]
<mailto:[email protected]>> wrote:
Hey folks,
Apologies for the slightly off-topic post, since this isn't
performance related, but I hope I'll be excused since this might be
interesting to the group members.
We're recently facing an issue where a JVM is deadlocking in some SSL
code. The resulting jstack report is bizarre -- in the deadlock
analysis section it indicates that one of the locks is held by some
thread, but in that thread's stack, it doesn't show the lock
anywhere. Was curious if anyone had any ideas on how a lock might be
"held but not held".
jstack output is as follows (with other threads and irrelevant bottom
stack frames removed):
Found one Java-level deadlock:
=============================
"Thread-38190":
waiting to lock monitor 0x00000000267f2628 (object
0x00000000802ba7f8, a sun.security.provider.Sun),
which is held by "New I/O worker #1810850"
"New I/O worker #1810850":
waiting to lock monitor 0x000000007482f5f8 (object
0x0000000080ac88f0, a java.lang.Class),
which is held by "New I/O worker #1810853"
"New I/O worker #1810853":
waiting to lock monitor 0x00000000267f2628 (object
0x00000000802ba7f8, a sun.security.provider.Sun),
which is held by "New I/O worker #1810850"
Java stack information for the threads listed above:
===================================================
"Thread-38190":
at java.security.Provider.getService(Provider.java:1035)
- waiting to lock <0x00000000802ba7f8> (a
sun.security.provider.Sun)
at
sun.security.jca.ProviderList.getService(ProviderList.java:332)
at sun.security.jca.GetInstance.getInstance(GetInstance.java:157)
at javax.net.ssl.SSLContext.getInstance(SSLContext.java:156)
at
org.apache.kudu.client.SecurityContext.<init>(SecurityContext.java:84)
...
"New I/O worker #1810850":
at
sun.security.ssl.CipherSuite$BulkCipher.isAvailable(CipherSuite.java:542)
- waiting to lock <0x0000000080ac88f0> (a java.lang.Class for
sun.security.ssl.CipherSuite$BulkCipher)
at
sun.security.ssl.CipherSuite$BulkCipher.isAvailable(CipherSuite.java:527)
at sun.security.ssl.CipherSuite.isAvailable(CipherSuite.java:194)
at
sun.security.ssl.Handshaker.getActiveProtocols(Handshaker.java:712)
at sun.security.ssl.Handshaker.activate(Handshaker.java:498)
at
sun.security.ssl.SSLEngineImpl.kickstartHandshake(SSLEngineImpl.java:729)
- locked <0x000000008eca76f8> (a sun.security.ssl.SSLEngineImpl)
at
sun.security.ssl.SSLEngineImpl.beginHandshake(SSLEngineImpl.java:756)
at
org.apache.kudu.client.shaded.org.jboss.netty.handler.ssl.SslHandler.handshake(SslHandler.java:361)
- locked <0x000000008eca8298> (a java.lang.Object)
at
org.apache.kudu.client.Negotiator.startTlsHandshake(Negotiator.java:432)
...
"New I/O worker #1810853":
at java.security.Provider.getService(Provider.java:1035)
- waiting to lock <0x00000000802ba7f8> (a
sun.security.provider.Sun)
at
sun.security.jca.ProviderList$ServiceList.tryGet(ProviderList.java:444)
at
sun.security.jca.ProviderList$ServiceList.access$200(ProviderList.java:376)
at
sun.security.jca.ProviderList$ServiceList$1.hasNext(ProviderList.java:486)
at javax.crypto.Cipher.getInstance(Cipher.java:513)
at sun.security.ssl.JsseJce.getCipher(JsseJce.java:229)
at sun.security.ssl.CipherBox.<init>(CipherBox.java:179)
at sun.security.ssl.CipherBox.newCipherBox(CipherBox.java:263)
at
sun.security.ssl.CipherSuite$BulkCipher.newCipher(CipherSuite.java:505)
at
sun.security.ssl.CipherSuite$BulkCipher.isAvailable(CipherSuite.java:572)
- locked <0x0000000080ac88f0> (a java.lang.Class for
sun.security.ssl.CipherSuite$BulkCipher)
at
sun.security.ssl.CipherSuite$BulkCipher.isAvailable(CipherSuite.java:527)
at sun.security.ssl.CipherSuite.isAvailable(CipherSuite.java:194)
at
sun.security.ssl.SSLContextImpl.getApplicableCipherSuiteList(SSLContextImpl.java:346)
at
sun.security.ssl.SSLContextImpl.getDefaultCipherSuiteList(SSLContextImpl.java:297)
- locked <0x000000008ebd1880> (a
sun.security.ssl.SSLContextImpl$TLSContext)
at sun.security.ssl.SSLEngineImpl.init(SSLEngineImpl.java:402)
at sun.security.ssl.SSLEngineImpl.<init>(SSLEngineImpl.java:349)
at
sun.security.ssl.SSLContextImpl.engineCreateSSLEngine(SSLContextImpl.java:201)
at javax.net.ssl.SSLContext.createSSLEngine(SSLContext.java:329)
...
Note that multiple threads are waiting on "0x00000000802ba7f8, a
sun.security.provider.Sun" but no thread is listed as holding this in
its stacks.
The additional thread info from higher up in the jstack is:
"Thread-38190" #2454575 prio=5 os_prio=0 tid=0x0000000040fff000
nid=0x8b12 waiting for monitor entry [0x00007ff9a129e000]
"New I/O worker #1810850" #2448031 daemon prio=5 os_prio=0
tid=0x00007ff87df00000 nid=0x42bf waiting for monitor entry
[0x00007ff918a61000]
"New I/O worker #1810853" #2448034 daemon prio=5 os_prio=0
tid=0x00007ff87df02800 nid=0x42c4 waiting for monitor entry
[0x00007ff911654000]
Native frames look like normal lock acquisition:
0x00007ffa4e6546d5 __pthread_cond_wait + 0xc5
0x00007ffa50516f7d _ZN13ObjectMonitor6EnterIEP6Thread + 0x31d
0x00007ffa50518f11 _ZN13ObjectMonitor5enterEP6Thread + 0x301
0x00007ffa505cc8ff
_ZN13SharedRuntime26complete_monitor_locking_CEP7oopDescP9BasicLockP10JavaThread
+ 0x9f
Other relevant info:
- JVM: Java HotSpot(TM) 64-Bit Server VM (25.102-b14 mixed mode)
(jdk1.8.0_102)
- OS: RHEL 7.2 (kernel 3.10.0-327.36.2.el7.x86_64) [not that
infamous futex bug]
- libc version: glibc-2.17-157.el7.x86_64
- nothing interesting in dmesg
- this app actually is a native application which embeds a JVM using
JNI, if that makes a difference.
Anyone else seen issues like this with relatively recent Java 8? Or
any ideas on next steps to debug the phantom lock holder?
-Todd
--
You received this message because you are subscribed to the Google
Groups "mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to [email protected]
<mailto:[email protected]>.
For more options, visit https://groups.google.com/d/optout.