Thanks Gil (as always!). Whether or not 'the software is the computer' or 'the silicon is the computer' or 'the network is the computer' ... how about this:
The COMPUTATION is the computer? With admiration and appreciation, Ben On Thu, Oct 12, 2017 at 12:09 PM Gil Tene <[email protected]> wrote: > The "machine" we run on is not just the hardware. It's also the BIOS, the > hypervisor, the kernel, the container system, the system libraries, and the > various runtimes. Stuff that is "interesting" and "strange" about how the > machine seems to behave is very appropriate to this group, IMO. And > mysterious deadlock behaviors and related issues the machine level (e.g. an > apparent deadlock where no lock owner appears to exist, as is the case > discussed here) is certainly an "interesting" machine behavior. Java for > not. libposix or not, Linux or not. C#/C++/Rust or not. The fact that much > of concurrency work happens to be done in Java does make Java and JVMs a > more common context in which these issues are discussed, but the same can > be said about Linux, even tho this is not a Linux support group. > > E.g. the discussion we had a while back about Linux's futex wakeup bug > <https://groups.google.com/forum/#!topic/mechanical-sympathy/QbmpZxp6C64> > (where certain kernel versions failed to wake up futex's, creating ()among > other things) apparent deadlocks in non-deadlocking code) was presumably > appropriate for this group. I see Todd's query as no different. It is not > "my program has a deadlock" question. It is an observed "deadlock that > isn't a deadlock" question. It may be a bug in tooling/reporting (e.g. > tooling might be deducing the deadlock based on non-atomic sampling of > stack state as some have suggested here), or it may be a bug in the lock > mechanisms. e.g. failed wakeup at the JVM level. Either way, to would be > just as interesting and relevant here as a failed wakeup or wrong lock > state intrumentation at the Linux kernel level. or at the .NET CLR level, > or at the golang runtime level, etc... > > On Wednesday, October 11, 2017 at 6:06:16 AM UTC-7, Jarkko Miettinen wrote: >> >> keskiviikko 11. lokakuuta 2017 11.27.17 UTC+3 Avi Kivity kirjoitti: >>> >>> If this is not off topic, what is the topic of this group? >>> >>> >>> Is it a Java support group, or a "coding in a way that exploits the way >>> the hardware works" group? >>> >> >> >> I have to agree here with Avi. Better to have a group for "coding in a >> way that exploits the way the hardware works" and another group for Java >> support. Otherwise there will be a lot of discussion of no real relation to >> the topic except that people exploiting mechanical sympathy might have run >> into such problems. >> >> (I would also be interested in a Java support group for the level of >> problems that have been posted in this group before.) >> >> >>> >>> On 10/11/2017 10:29 AM, Kirk Pepperdine wrote: >>> >>> Not at all off topic… first, thread dumps lie like a rug… and here is >>> why… >>> >>> for each thread { >>> safe point >>> create stack trace for that thread >>> release threads from safe point >>> } >>> >>> And while rugs may attempt to cover the debris that you’ve swept under >>> them, that debris leaves a clearly visible lump that suggests that you have >>> a congestion problem on locks in both sun.security.provider.Sun and >>> java.lang.Class…. What could possibly go wrong? >>> >>> >>> Kind regards, >>> Kirk >>> >>> On Oct 11, 2017, at 3:05 AM, Todd Lipcon <[email protected]> wrote: >>> >>> Hey folks, >>> >>> Apologies for the slightly off-topic post, since this isn't performance >>> related, but I hope I'll be excused since this might be interesting to the >>> group members. >>> >>> We're recently facing an issue where a JVM is deadlocking in some SSL >>> code. The resulting jstack report is bizarre -- in the deadlock analysis >>> section it indicates that one of the locks is held by some thread, but in >>> that thread's stack, it doesn't show the lock anywhere. Was curious if >>> anyone had any ideas on how a lock might be "held but not held". >>> >>> jstack output is as follows (with other threads and irrelevant bottom >>> stack frames removed): >>> >>> Found one Java-level deadlock: >>> ============================= >>> "Thread-38190": >>> waiting to lock monitor 0x00000000267f2628 (object 0x00000000802ba7f8, >>> a sun.security.provider.Sun), >>> which is held by "New I/O worker #1810850" >>> "New I/O worker #1810850": >>> waiting to lock monitor 0x000000007482f5f8 (object 0x0000000080ac88f0, >>> a java.lang.Class), >>> which is held by "New I/O worker #1810853" >>> "New I/O worker #1810853": >>> waiting to lock monitor 0x00000000267f2628 (object 0x00000000802ba7f8, >>> a sun.security.provider.Sun), >>> which is held by "New I/O worker #1810850" >>> >>> Java stack information for the threads listed above: >>> =================================================== >>> "Thread-38190": >>> at java.security.Provider.getService(Provider.java:1035) >>> - waiting to lock <0x00000000802ba7f8> (a >>> sun.security.provider.Sun) >>> at >>> sun.security.jca.ProviderList.getService(ProviderList.java:332) >>> at sun.security.jca.GetInstance.getInstance(GetInstance.java:157) >>> at javax.net.ssl.SSLContext.getInstance(SSLContext.java:156) >>> at >>> org.apache.kudu.client.SecurityContext.<init>(SecurityContext.java:84) >>> ... >>> "New I/O worker #1810850": >>> at >>> sun.security.ssl.CipherSuite$BulkCipher.isAvailable(CipherSuite.java:542) >>> - waiting to lock <0x0000000080ac88f0> (a java.lang.Class for >>> sun.security.ssl.CipherSuite$BulkCipher) >>> at >>> sun.security.ssl.CipherSuite$BulkCipher.isAvailable(CipherSuite.java:527) >>> at sun.security.ssl.CipherSuite.isAvailable(CipherSuite.java:194) >>> at >>> sun.security.ssl.Handshaker.getActiveProtocols(Handshaker.java:712) >>> at sun.security.ssl.Handshaker.activate(Handshaker.java:498) >>> at >>> sun.security.ssl.SSLEngineImpl.kickstartHandshake(SSLEngineImpl.java:729) >>> - locked <0x000000008eca76f8> (a sun.security.ssl.SSLEngineImpl) >>> at >>> sun.security.ssl.SSLEngineImpl.beginHandshake(SSLEngineImpl.java:756) >>> at >>> org.apache.kudu.client.shaded.org.jboss.netty.handler.ssl.SslHandler.handshake(SslHandler.java:361) >>> - locked <0x000000008eca8298> (a java.lang.Object) >>> at >>> org.apache.kudu.client.Negotiator.startTlsHandshake(Negotiator.java:432) >>> ... >>> "New I/O worker #1810853": >>> at java.security.Provider.getService(Provider.java:1035) >>> - waiting to lock <0x00000000802ba7f8> (a >>> sun.security.provider.Sun) >>> at >>> sun.security.jca.ProviderList$ServiceList.tryGet(ProviderList.java:444) >>> at >>> sun.security.jca.ProviderList$ServiceList.access$200(ProviderList.java:376) >>> at >>> sun.security.jca.ProviderList$ServiceList$1.hasNext(ProviderList.java:486) >>> at javax.crypto.Cipher.getInstance(Cipher.java:513) >>> at sun.security.ssl.JsseJce.getCipher(JsseJce.java:229) >>> at sun.security.ssl.CipherBox.<init>(CipherBox.java:179) >>> at sun.security.ssl.CipherBox.newCipherBox(CipherBox.java:263) >>> at >>> sun.security.ssl.CipherSuite$BulkCipher.newCipher(CipherSuite.java:505) >>> at >>> sun.security.ssl.CipherSuite$BulkCipher.isAvailable(CipherSuite.java:572) >>> - locked <0x0000000080ac88f0> (a java.lang.Class for >>> sun.security.ssl.CipherSuite$BulkCipher) >>> at >>> sun.security.ssl.CipherSuite$BulkCipher.isAvailable(CipherSuite.java:527) >>> at sun.security.ssl.CipherSuite.isAvailable(CipherSuite.java:194) >>> at >>> sun.security.ssl.SSLContextImpl.getApplicableCipherSuiteList(SSLContextImpl.java:346) >>> at >>> sun.security.ssl.SSLContextImpl.getDefaultCipherSuiteList(SSLContextImpl.java:297) >>> - locked <0x000000008ebd1880> (a >>> sun.security.ssl.SSLContextImpl$TLSContext) >>> at sun.security.ssl.SSLEngineImpl.init(SSLEngineImpl.java:402) >>> at sun.security.ssl.SSLEngineImpl.<init>(SSLEngineImpl.java:349) >>> at >>> sun.security.ssl.SSLContextImpl.engineCreateSSLEngine(SSLContextImpl.java:201) >>> at javax.net.ssl.SSLContext.createSSLEngine(SSLContext.java:329) >>> ... >>> >>> >>> Note that multiple threads are waiting on "0x00000000802ba7f8, a >>> sun.security.provider.Sun" but no thread is listed as holding this in its >>> stacks. >>> >>> The additional thread info from higher up in the jstack is: >>> >>> "Thread-38190" #2454575 prio=5 os_prio=0 tid=0x0000000040fff000 >>> nid=0x8b12 waiting for monitor entry [0x00007ff9a129e000] >>> "New I/O worker #1810850" #2448031 daemon prio=5 os_prio=0 >>> tid=0x00007ff87df00000 nid=0x42bf waiting for monitor entry >>> [0x00007ff918a61000] >>> "New I/O worker #1810853" #2448034 daemon prio=5 os_prio=0 >>> tid=0x00007ff87df02800 nid=0x42c4 waiting for monitor entry >>> [0x00007ff911654000] >>> >>> Native frames look like normal lock acquisition: >>> 0x00007ffa4e6546d5 __pthread_cond_wait + 0xc5 >>> 0x00007ffa50516f7d _ZN13ObjectMonitor6EnterIEP6Thread + 0x31d >>> 0x00007ffa50518f11 _ZN13ObjectMonitor5enterEP6Thread + 0x301 >>> 0x00007ffa505cc8ff >>> >>> _ZN13SharedRuntime26complete_monitor_locking_CEP7oopDescP9BasicLockP10JavaThread >>> + 0x9f >>> >>> Other relevant info: >>> - JVM: Java HotSpot(TM) 64-Bit Server VM (25.102-b14 mixed mode) >>> (jdk1.8.0_102) >>> - OS: RHEL 7.2 (kernel 3.10.0-327.36.2.el7.x86_64) [not that infamous >>> futex bug] >>> - libc version: glibc-2.17-157.el7.x86_64 >>> - nothing interesting in dmesg >>> - this app actually is a native application which embeds a JVM using >>> JNI, if that makes a difference. >>> >>> >>> Anyone else seen issues like this with relatively recent Java 8? Or any >>> ideas on next steps to debug the phantom lock holder? >>> >>> -Todd >>> >>> -- >>> You received this message because you are subscribed to the Google >>> Groups "mechanical-sympathy" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to [email protected]. >>> For more options, visit https://groups.google.com/d/optout. >>> >>> >>> -- >>> You received this message because you are subscribed to the Google >>> Groups "mechanical-sympathy" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to [email protected]. >>> For more options, visit https://groups.google.com/d/optout. >>> >>> >>> -- > You received this message because you are subscribed to the Google Groups > "mechanical-sympathy" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
