If this is not off topic, what is the topic of this group?


Is it a Java support group, or a "coding in a way that exploits the way the hardware works" group?



On 10/11/2017 10:29 AM, Kirk Pepperdine wrote:
Not at all off topic… first, thread dumps lie like a rug… and here is why…

for each thread {
    safe point
    create stack trace for that thread
    release threads from safe point
}

And while rugs may attempt to cover the debris that you’ve swept under them, that debris leaves a clearly visible lump that suggests that you have a congestion problem on locks in both sun.security.provider.Sun and java.lang.Class…. What could possibly go wrong?


Kind regards,
Kirk

On Oct 11, 2017, at 3:05 AM, Todd Lipcon <[email protected] <mailto:[email protected]>> wrote:

Hey folks,

Apologies for the slightly off-topic post, since this isn't performance related, but I hope I'll be excused since this might be interesting to the group members.

We're recently facing an issue where a JVM is deadlocking in some SSL code. The resulting jstack report is bizarre -- in the deadlock analysis section it indicates that one of the locks is held by some thread, but in that thread's stack, it doesn't show the lock anywhere. Was curious if anyone had any ideas on how a lock might be "held but not held".

jstack output is as follows (with other threads and irrelevant bottom stack frames removed):

Found one Java-level deadlock:
=============================
"Thread-38190":
  waiting to lock monitor 0x00000000267f2628 (object 0x00000000802ba7f8, a sun.security.provider.Sun),
  which is held by "New I/O worker #1810850"
"New I/O worker #1810850":
  waiting to lock monitor 0x000000007482f5f8 (object 0x0000000080ac88f0, a java.lang.Class),
  which is held by "New I/O worker #1810853"
"New I/O worker #1810853":
  waiting to lock monitor 0x00000000267f2628 (object 0x00000000802ba7f8, a sun.security.provider.Sun),
  which is held by "New I/O worker #1810850"

Java stack information for the threads listed above:
===================================================
"Thread-38190":
        at java.security.Provider.getService(Provider.java:1035)
        - waiting to lock <0x00000000802ba7f8> (a sun.security.provider.Sun)         at sun.security.jca.ProviderList.getService(ProviderList.java:332)
        at sun.security.jca.GetInstance.getInstance(GetInstance.java:157)
        at javax.net.ssl.SSLContext.getInstance(SSLContext.java:156)
        at org.apache.kudu.client.SecurityContext.<init>(SecurityContext.java:84)
        ...
"New I/O worker #1810850":
        at sun.security.ssl.CipherSuite$BulkCipher.isAvailable(CipherSuite.java:542)         - waiting to lock <0x0000000080ac88f0> (a java.lang.Class for sun.security.ssl.CipherSuite$BulkCipher)         at sun.security.ssl.CipherSuite$BulkCipher.isAvailable(CipherSuite.java:527)
        at sun.security.ssl.CipherSuite.isAvailable(CipherSuite.java:194)
        at sun.security.ssl.Handshaker.getActiveProtocols(Handshaker.java:712)
        at sun.security.ssl.Handshaker.activate(Handshaker.java:498)
        at sun.security.ssl.SSLEngineImpl.kickstartHandshake(SSLEngineImpl.java:729)
        - locked <0x000000008eca76f8> (a sun.security.ssl.SSLEngineImpl)
        at sun.security.ssl.SSLEngineImpl.beginHandshake(SSLEngineImpl.java:756)         at org.apache.kudu.client.shaded.org.jboss.netty.handler.ssl.SslHandler.handshake(SslHandler.java:361)
        - locked <0x000000008eca8298> (a java.lang.Object)
        at org.apache.kudu.client.Negotiator.startTlsHandshake(Negotiator.java:432)
        ...
"New I/O worker #1810853":
        at java.security.Provider.getService(Provider.java:1035)
        - waiting to lock <0x00000000802ba7f8> (a sun.security.provider.Sun)         at sun.security.jca.ProviderList$ServiceList.tryGet(ProviderList.java:444)         at sun.security.jca.ProviderList$ServiceList.access$200(ProviderList.java:376)         at sun.security.jca.ProviderList$ServiceList$1.hasNext(ProviderList.java:486)
        at javax.crypto.Cipher.getInstance(Cipher.java:513)
        at sun.security.ssl.JsseJce.getCipher(JsseJce.java:229)
        at sun.security.ssl.CipherBox.<init>(CipherBox.java:179)
        at sun.security.ssl.CipherBox.newCipherBox(CipherBox.java:263)
        at sun.security.ssl.CipherSuite$BulkCipher.newCipher(CipherSuite.java:505)         at sun.security.ssl.CipherSuite$BulkCipher.isAvailable(CipherSuite.java:572)         - locked <0x0000000080ac88f0> (a java.lang.Class for sun.security.ssl.CipherSuite$BulkCipher)         at sun.security.ssl.CipherSuite$BulkCipher.isAvailable(CipherSuite.java:527)
        at sun.security.ssl.CipherSuite.isAvailable(CipherSuite.java:194)
        at sun.security.ssl.SSLContextImpl.getApplicableCipherSuiteList(SSLContextImpl.java:346)         at sun.security.ssl.SSLContextImpl.getDefaultCipherSuiteList(SSLContextImpl.java:297)         - locked <0x000000008ebd1880> (a sun.security.ssl.SSLContextImpl$TLSContext)
        at sun.security.ssl.SSLEngineImpl.init(SSLEngineImpl.java:402)
        at sun.security.ssl.SSLEngineImpl.<init>(SSLEngineImpl.java:349)
        at sun.security.ssl.SSLContextImpl.engineCreateSSLEngine(SSLContextImpl.java:201)
        at javax.net.ssl.SSLContext.createSSLEngine(SSLContext.java:329)
        ...


Note that multiple threads are waiting on "0x00000000802ba7f8, a sun.security.provider.Sun" but no thread is listed as holding this in its stacks.

The additional thread info from higher up in the jstack is:

"Thread-38190" #2454575 prio=5 os_prio=0 tid=0x0000000040fff000 nid=0x8b12 waiting for monitor entry [0x00007ff9a129e000] "New I/O worker #1810850" #2448031 daemon prio=5 os_prio=0 tid=0x00007ff87df00000 nid=0x42bf waiting for monitor entry [0x00007ff918a61000] "New I/O worker #1810853" #2448034 daemon prio=5 os_prio=0 tid=0x00007ff87df02800 nid=0x42c4 waiting for monitor entry [0x00007ff911654000]

Native frames look like normal lock acquisition:
0x00007ffa4e6546d5  __pthread_cond_wait + 0xc5
0x00007ffa50516f7d  _ZN13ObjectMonitor6EnterIEP6Thread + 0x31d
0x00007ffa50518f11  _ZN13ObjectMonitor5enterEP6Thread + 0x301
0x00007ffa505cc8ff  _ZN13SharedRuntime26complete_monitor_locking_CEP7oopDescP9BasicLockP10JavaThread + 0x9f

Other relevant info:
- JVM: Java HotSpot(TM) 64-Bit Server VM (25.102-b14 mixed mode)   (jdk1.8.0_102) - OS: RHEL 7.2 (kernel 3.10.0-327.36.2.el7.x86_64)  [not that infamous futex bug]
- libc version: glibc-2.17-157.el7.x86_64
- nothing interesting in dmesg
- this app actually is a native application which embeds a JVM using JNI, if that makes a difference.


Anyone else seen issues like this with relatively recent Java 8? Or any ideas on next steps to debug the phantom lock holder?

-Todd

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected] <mailto:[email protected]>.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected] <mailto:[email protected]>.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups 
"mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reply via email to