FW: JVM hanging when using G1GC on JDK8 b78 or b79 (Linux 32 bit)

Uwe Schindler Wed, 06 Mar 2013 04:35:17 -0800

They already understood the G1GC problem with JDK 8 b78/b79 and working on a 
fix. This was really fast:
http://mail.openjdk.java.net/pipermail/hotspot-gc-dev/2013-March/006128.html


Uwe

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: [email protected]

--- Begin Message ---
Hi all,
I sent this email earlier, but I did "reply list" instead of "replyall". Sorry about that.
The hang is due to the fact that we are using single threaded referenceprocessing but end up in the multi threaded code path and get stuck in aloop that waits for the other processing threads to terminate.
John Cuthbertson is working on a fix for this. I think we have all theinformation we need to solve this.
Bengt

On 3/6/13 9:04 AM, Bengt Rutisson wrote:
David,
I think this is a VM bug and the thread dumps that Uwe produced areenough to start tracking down the root cause.
On 3/6/13 8:52 AM, David Holmes wrote:
If the VM is completely unresponsive then it suggests we are at asafepoint.
Yes, we are hanging during a stop-the-world GC, so we are at a safepoint.
The GC threads are not "hung" in os::parK, they are parked - waitingto be notified of something.
It looks like the reference processing thread is stuck in a loop whereit does wait(). So, the VM is hanging even if that stack trace alsoends up in os::park().
The thing is to find out why they are not being woken up.
Actually, in this case we should probably not even be calling wait...
Can the gdb log be posted somewhere? I don't know if the attachmentmade it to the original posting on hotspot-gc but it's no longeravailable on hotspot-dev.
I received the attachment with the original email. I've attached it tothe bug report that I created: 8009536. You can find it there if youwant to. But I think we have a fairly good idea of what change causedthe hang.
Bengt
Thanks,
David

On 6/03/2013 4:07 PM, Krystal Mok wrote:
Hi Uwe,

If you can attach gdb onto it, and jstack -m and jstack -F should also
work; that'll get you the Java stack trace.
(But it probably doesn't matter in this case, because the hang is
probably bug in the VM).

- Kris
On Wed, Mar 6, 2013 at 5:48 AM, Uwe Schindler<[email protected]> wrote:
Hi,
since a few month we are extensively testing various preview buildsof JDK 8 for compatibility with Apache Lucene and Solr, so we canfind any bugs early and prevent the problems we had with therelease of Java 7 two years ago. Currently we have a Linux (Ubuntu64bit) Jenkins machine that has various JDKs (JDK 6, JDK 7, JDK 8snapshot, IBM J9, older JRockit) installed, choosing a differentone with different hotspot and garbage collector settings on everyrun of the test suite (which takes approx. 30-45 minutes).
JDK 8 b79 works so far very well on Linux, we found some strangebehavior in early versions (maybe compiler errors), but no longerat the moment. There is one configuration that constantly andreproducibly hangs in one module that is tested: The configurationuses JDK 8 b79 (same for b78), 32 bit, and G1GC (server or clientdoes not matter). The JVM running the tests hangs irresponsible(jstack or kill -3 have no effect/cannot connect, standard killdoes not stop it, only kill -9 actually kills it). It can bereproduced in this Lucene module 100% (it hangs always).
I was able to connect with GDB to the JVM and get a stack trace onall threads (see attachment, dump.txt). As you see all threads ofG1GC seem to hang in a syscall (os:park(), a conditional wait inpthread library). Unfortunately that’s all I can give you. A Javastacktrace is not possible because the JVM reacts on neither kill-3 nor jstack. With all other garbage collectors it passes the testwithout hangs in a few seconds, with 32 bit G1GC it can stand stillfor hours. The 64 bit JVM passes with G1GC, so only the 32 bitvariant is affected. Client or Server VM makes no difference.
To reproduce:
- Use a 32 bit JDK 8 b78 or b79 (tested on Linux 64 bit, but thisshould not matter)- Download Lucene Source code (e.g. the snapshot version we weretesting with:
https://builds.apache.org/job/Lucene-Artifacts-trunk/2212/artifact/lucene/di
st/)
- change to directory lucene/analysis/uima and run:
ant -Dargs="-server -XX:+UseG1GC" -Dtests.multiplier=3-Dtests.jvms=1 testAfter a while the test framework prints "stalled" messages (becausethe child VM actually running the test no longer responds). The PIDis also printed. Try to get a stack trace or kill it, no response.Only kill -9 helps. Choosing another garbage collector in the abovecommand line makes the test finish after a few seconds, e.g.-Dargs="-server -XX:+UseConcMarkSweepGC"
I posted this bug report directly to the mailing list, because withearlier bug reports, there seem to be a problem with bugs.sun.com -there is no response from any reviewer after several weeks and wewere able to help to find and fix javadoc and javac-compiler bugsearly. So I hope you can help for this bug, too.
Uwe

-----
Uwe Schindler
[email protected]
Apache Lucene PMC Member / Committer
Bremen, Germany
http://lucene.apache.org/
--- End Message ---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

FW: JVM hanging when using G1GC on JDK8 b78 or b79 (Linux 32 bit)

Reply via email to