Awesome work all!
On Thu, Mar 7, 2013 at 7:24 AM, Bengt Rutisson <bengt.rutis...@oracle.com>wrote: > > John and Uwe, > > I followed the original instruction sent out by Uwe to reproduce the test. > I got it up and running on my Windows x64 workstation using a 32 bit > binary. The test hangs every time I run it. > > John, I think your proxy issues are due to the fact that ant picks up its > proxy setting from Java. So you need to set the system properties > http.proxyHost and http.proxyPort. I did this by exporting the the > _JAVA_OPTIONS environment variable as: > > _JAVA_OPTIONS=-Dhttp.proxyHost=<oracle www proxy> -Dhttp.proxyPort=<oracle > proxy port> > > Let me know if this does not work for you. We can try to debug it offline. > > Since I could catch the hang in a debugger I could confirm both that the > hang is indeed related to the recent change to the > DrainMarkingStackClosures and that the problem is that we enter the > termination protocol even when reference processing is single threaded. > > Looking at the comment in the constructor for G1CMDrainMarkingStackClosure: > > // We only allow stealing and only enter the termination protocol > // in CMTask::do_marking_step() if this closure is being instantiated > // for parallel reference processing. > _do_stealing = _do_termination = is_par; > > I came up with a patch that makes the test work again. But I leave it to > you, John, to figure out if this is the right way to solve the problem. > > diff --git a/src/share/vm/gc_implementation/g1/concurrentMark.cpp > b/src/share/vm/gc_implementation/g1/concurrentMark.cpp > --- a/src/share/vm/gc_implementation/g1/concurrentMark.cpp > +++ b/src/share/vm/gc_implementation/g1/concurrentMark.cpp > @@ -4336,7 +4336,9 @@ > gclog_or_tty->print_cr("[%u] detected overflow", _worker_id); > } > > + if (do_stealing || do_termination) { > _cm->enter_first_sync_barrier(_worker_id); > + } > // When we exit this sync barrier we know that all tasks have > // stopped doing marking work. So, it's now safe to > // re-initialise our data structures. At the end of this method, > @@ -4347,8 +4349,10 @@ > // We clear the local state of this task... > clear_region_fields(); > > + if (do_stealing || do_termination) { > // ...and enter the second barrier. > _cm->enter_second_sync_barrier(_worker_id); > + } > // At this point everything has bee re-initialised and we're > // ready to restart. > } > > > Thanks, > Bengt > > > On 3/7/13 7:44 AM, Uwe Schindler wrote: > > Hi John,**** > > ** ** > > I only have time to work on a setup this evening Germen time, because I am > on a business trip today. Will come back to you. Unfortunately I failed to > quickly setup an easy classpath without Ivy downloading the JARS. **** > > ** ** > > Uwe**** > > ** ** > > -----**** > > Uwe Schindler**** > > uschind...@apache.org **** > > Apache Lucene PMC Member / Committer**** > > Bremen, Germany**** > > http://lucene.apache.org/**** > > ** ** > > *From:* John Cuthbertson > [mailto:john.cuthbert...@oracle.com<john.cuthbert...@oracle.com>] > > *Sent:* Thursday, March 07, 2013 12:49 AM > *To:* Uwe Schindler > *Cc:* 'Bengt Rutisson'; hotspot-gc-...@openjdk.java.net; > dev@lucene.apache.org > *Subject:* Re: JVM hanging when using G1GC on JDK8 b78 or b79 (Linux 32 > bit)**** > > ** ** > > Hi Uwe, > > An update: > > I have downloaded ant and the lucerne source. > > I attempted the ivy-bootstrap but it failed to download the ivy=2.3.0.jar > file - even after setting: > > ANT_OPTS=-Dhttp.proxyHost=<...> -Dhttp.proxyPort=<...> > > So I manually downloaded and placed it into the ANT library and now get: > > > **** > > ivy-bootstrap1: > [mkdir] Skipping /home/jcuthber/.ant/lib because it already exists. > [echo] installing ivy 2.3.0 to /home/jcuthber/.ant/lib > [get] Getting: > http://repo1.maven.org/maven2/org/apache/ivy/ivy/2.3.0/ivy-2.3.0.jar > [get] To: /home/jcuthber/.ant/lib/ivy-2.3.0.jar > [get] Error getting > http://repo1.maven.org/maven2/org/apache/ivy/ivy/2.3.0/ivy-2.3.0.jar to > /home/jcuthber/.ant/lib/ivy-2.3.0.jar > [available] Found: /home/jcuthber/.ant/lib/ivy-2.3.0.jar > > ivy-bootstrap2: > Skipped because property 'ivy.bootstrap1.success' set. > > ivy-checksum: > > ivy-bootstrap: > > BUILD SUCCESSFUL > Total time: 3 minutes 46 seconds**** > > Presumably I have to build the lucerne source before executing the tests. > That seemed to go OK. > > When I run the analysis/uima tests it seems to get hung up at the > "resolve" target - even without specifying G1: > > > **** > > cairnapple{jcuthber}:408> cd analysis/uima/ > cairnapple{jcuthber}:409> ls -l > total 29 > -rw-r--r-- 1 jcuthber staff 1473 Dec 10 10:39 build.xml > -rw-rw-r-- 1 jcuthber staff 6895 Mar 6 15:20 hotspot.log > -rw-r--r-- 1 jcuthber staff 1316 Mar 30 2012 ivy.xml > drwxr-xr-x 2 jcuthber staff 2 Mar 5 07:42 lib/ > drwxr-xr-x 6 jcuthber staff 6 Mar 5 07:42 src/**** > > > > **** > > ivy-configure: > [ivy:configure] Loading > jar:file:/home/jcuthber/.ant/lib/ivy-2.3.0.jar!/org/apache/ivy/core/settings/ivy.properties > [ivy:configure] :: Apache Ivy 2.3.0 - 20130110142753 :: > http://ant.apache.org/ivy/ :: > [ivy:configure] jakarta commons httpclient not found: using jdk url > handling > [ivy:configure] :: loading settings :: file = > /export/bugs/8009536/lucene-5.0-2013-03-05_15-37-06/ivy-settings.xml > [ivy:configure] no default ivy user dir defined: set to > /home/jcuthber/.ivy2 > [ivy:configure] including url: > jar:file:/home/jcuthber/.ant/lib/ivy-2.3.0.jar!/org/apache/ivy/core/settings/ivysettings-public.xml > [ivy:configure] no default cache defined: set to /home/jcuthber/.ivy2/cache > [ivy:configure] including url: > jar:file:/home/jcuthber/.ant/lib/ivy-2.3.0.jar!/org/apache/ivy/core/settings/ivysettings-shared.xml > [ivy:configure] including url: > jar:file:/home/jcuthber/.ant/lib/ivy-2.3.0.jar!/org/apache/ivy/core/settings/ivysettings-local.xml > [ivy:configure] including url: > jar:file:/home/jcuthber/.ant/lib/ivy-2.3.0.jar!/org/apache/ivy/core/settings/ivysettings-main-chain.xml > [ivy:configure] settings loaded (289ms) > [ivy:configure] default cache: /home/jcuthber/.ivy2/cache > [ivy:configure] default resolver: default > [ivy:configure] -- 7 resolvers: > [ivy:configure] working-chinese-mirror [ibiblio] > [ivy:configure] main [chain] [shared, public] > [ivy:configure] local [file] > [ivy:configure] shared [file] > [ivy:configure] sonatype-releases [ibiblio] > [ivy:configure] public [ibiblio] > [ivy:configure] default [chain] [local, main, sonatype-releases, > working-chinese-mirror] > > resolve: > [ivy:retrieve] no resolved descriptor found: launching default resolve > Overriding previous definition of property "ivy.version" > [ivy:retrieve] using ivy parser to parse > file:/export/bugs/8009536/lucene-5.0-2013-03-05_15-37-06/analysis/uima/ivy.xml > [ivy:retrieve] :: resolving dependencies :: > org.apache.lucene#analyzers-uima;working@cairnapple > [ivy:retrieve] confs: [default] > [ivy:retrieve] validate = true > [ivy:retrieve] refresh = false > [ivy:retrieve] resolving dependencies for configuration 'default' > [ivy:retrieve] == resolving dependencies for > org.apache.lucene#analyzers-uima;working@cairnapple [default] > [ivy:retrieve] == resolving dependencies > org.apache.lucene#analyzers-uima;working@cairnapple->org.apache.uima#Tagger;2.3.1 > [default->*] > [ivy:retrieve] default: Checking cache for: dependency: > org.apache.uima#Tagger;2.3.1 {*=[*]} > [ivy:retrieve] don't use cache for org.apache.uima#Tagger;2.3.1: > checkModified=true > [ivy:retrieve] tried > /home/jcuthber/.ivy2/local/org.apache.uima/Tagger/2.3.1/ivys/ivy.xml > [ivy:retrieve] tried > /home/jcuthber/.ivy2/local/org.apache.uima/Tagger/2.3.1/jars/Tagger.jar > [ivy:retrieve] local: no ivy file nor artifact found for > org.apache.uima#Tagger;2.3.1 > [ivy:retrieve] main: Checking cache for: dependency: > org.apache.uima#Tagger;2.3.1 {*=[*]} > [ivy:retrieve] tried > /home/jcuthber/.ivy2/shared/org.apache.uima/Tagger/2.3.1/ivys/ivy.xml > [ivy:retrieve] tried > /home/jcuthber/.ivy2/shared/org.apache.uima/Tagger/2.3.1/jars/Tagger.jar > [ivy:retrieve] shared: no ivy file nor artifact found for > org.apache.uima#Tagger;2.3.1 > [ivy:retrieve] tried > http://repo1.maven.org/maven2/org/apache/uima/Tagger/2.3.1/Tagger-2.3.1.pom > **** > > and there it hangs - presumably trying to access > http://repo1.maven.org/maven2/org/apache/uima/Tagger/2.3.1/Tagger-2.3.1.pom > > There must be something with our proxy settings that that won't allow this. > > JohnC > > > On 03/06/13 11:15, Uwe Schindler wrote: **** > > Hi,**** > > ** ** > > That's unfortunately not so easy, because of project dependencies. To run the > test you have to compile Lucene Core then the specific module + the test > framework (which is special for Lucene) and download some JARs from Maven > central (JAR hell, as usual).**** > > If you give me some time, I would collect all needed JAR files from my local > checkout and provide you the correct cmd line + a ZIP file with maybe a shell > script to startup. It should be doable, but needs some work to collect all > dependencies for the classpath.**** > > ** ** > > If you want to do it quicker (should be quite fast to do):**** > > - Download ANT 1.8.2 binary zip (unfortunately ANT 1.8.4 has a bug making it > not working out of the box with Java 8): > http://archive.apache.org/dist/ant/binaries/apache-ant-1.8.2-bin.tar.gz - I > just wonder about the fact: isn't ANT needed to build the JDK classlib by > itself? I remember that the FreeBSD OpenJDK build downloads ANT and does a > large part of the compilation using ANT...**** > > - put the ANT bin/ dir into your PATH**** > > - download the Apache Lucene source code from Jenkins: > https://builds.apache.org/job/Lucene-Artifacts-trunk/2212/artifact/lucene/dist/lucene-5.0-2013-03-05_15-37-06-src.tgz**** > > - go to extracted lucene source dir, call "ant ivy-bootstrap" (this will > download Apache IVY, so all dependencies can be downloaded from Maven > Central)**** > > - change to the module that fails: # cd analysis/uima**** > > - execute: # ant -Dargs="-server -XX:+UseG1GC" -Dtests.multiplier=3 > -Dtests.jvms=1 test**** > > - In a parallel console you might be able to attach to the process, the build > in the main console using ANT runs inside ANT and the test framework spawns > separate worker instances of the JVM to execute the tests. This makes it hard > to reproduce in standalone (the command line passed to the child JVM is > veeeeery long).**** > > ** ** > > I will work on putting together a precompiled ZIP file with all needed JARs + > the command line. Just tell me if you got it managed with the above howto, > then I don’t need to do this.**** > > Uwe**** > > ** ** > > -----**** > > Uwe Schindler**** > > uschind...@apache.org **** > > Apache Lucene PMC Member / Committer**** > > Bremen, Germany**** > > http://lucene.apache.org/**** > > ** ** > > ** ** > > **** > > -----Original Message-----**** > > From: John Cuthbertson [mailto:john.cuthbert...@oracle.com > <john.cuthbert...@oracle.com>]**** > > Sent: Wednesday, March 06, 2013 7:51 PM**** > > To: Uwe Schindler**** > > Cc: 'Bengt Rutisson'; hotspot-gc-...@openjdk.java.net;**** > > dev@lucene.apache.org**** > > Subject: Re: JVM hanging when using G1GC on JDK8 b78 or b79 (Linux 32 bit)**** > > ** ** > > Hi Uwe,**** > > ** ** > > I've downloaded lucene-5.0-2013-03-05_15-37-06.zip from**** > > https://builds.apache.org/job/Lucene-Artifacts-**** > > trunk/2212/artifact/lucene/dist/**** > > ** ** > > I don't have ant on my workstation so do you have a java command line to**** > > run the test(s) that generate the error?**** > > ** ** > > Thanks,**** > > ** ** > > JohnC**** > > ** ** > > On 3/6/2013 3:16 AM, Uwe Schindler wrote:**** > > **** > > Hi,**** > > ** ** > > **** > > I think this is a VM bug and the thread dumps that Uwe produced are**** > > enough to start tracking down the root cause.**** > > **** > > I hope it is enough! If I can help with more details, tell me what I should > do**** > > **** > > to track this down. Unfortunately, we have no isolated test case (like a > small**** > > java class that triggers this bug) - you have to run the test cases of > this**** > > Lucene's module. It only happens there, not in any other Lucene test suite. > It**** > > may be caused by a lot of GC activity in this "UIMA" module or a specific > test.**** > > **** > > On 3/6/13 8:52 AM, David Holmes wrote:**** > > **** > > If the VM is completely unresponsive then it suggests we are at a**** > > safepoint.**** > > **** > > Yes, we are hanging during a stop-the-world GC, so we are at a safepoint.**** > > ** ** > > **** > > The GC threads are not "hung" in os::parK, they are parked - waiting**** > > to be notified of something.**** > > **** > > It looks like the reference processing thread is stuck in a loop**** > > where it does wait(). So, the VM is hanging even if that stack trace**** > > also ends up in os::park().**** > > ** ** > > **** > > The thing is to find out why they are not being woken up.**** > > **** > > Actually, in this case we should probably not even be calling wait...**** > > ** ** > > **** > > Can the gdb log be posted somewhere? I don't know if the attachment**** > > made it to the original posting on hotspot-gc but it's no longer**** > > available on hotspot-dev.**** > > **** > > I received the attachment with the original email. I've attached it**** > > to the bug report that I created: 8009536. You can find it there if**** > > you want to. But I think we have a fairly good idea of what change**** > > caused the hang.**** > > **** > > If it helps: Unfortunately, we had some problems with recent JDK builds,**** > > **** > > because javac and javadoc tools were not working correctly, failing to > build**** > > our source code. Since b78 this was fixed. Until this was fixed, we used > build**** > > b65 (which was the last one working) and the G1GC hangs did not appear on**** > > this version. So it must have happened by a change after b65 till b78.**** > > **** > > Uwe**** > > ** ** > > **** > > Bengt**** > > ** ** > > **** > > Thanks,**** > > David**** > > ** ** > > On 6/03/2013 4:07 PM, Krystal Mok wrote:**** > > **** > > Hi Uwe,**** > > ** ** > > If you can attach gdb onto it, and jstack -m and jstack -F should**** > > also work; that'll get you the Java stack trace.**** > > (But it probably doesn't matter in this case, because the hang is**** > > probably bug in the VM).**** > > ** ** > > - Kris**** > > ** ** > > On Wed, Mar 6, 2013 at 5:48 AM, Uwe Schindler**** > > **** > > <uschind...@apache.org> <uschind...@apache.org>**** > > **** > > wrote:**** > > **** > > Hi,**** > > ** ** > > since a few month we are extensively testing various preview**** > > builds of JDK 8 for compatibility with Apache Lucene and Solr, so**** > > we can find any bugs early and prevent the problems we had with**** > > the release of Java 7 two years ago. Currently we have a Linux**** > > (Ubuntu 64bit) Jenkins machine that has various JDKs (JDK 6, JDK**** > > 7, JDK 8 snapshot, IBM J9, older JRockit) installed, choosing a**** > > different one with different hotspot and garbage collector**** > > settings on every run of the test suite (which takes approx. 30-45**** > > **** > > minutes).**** > > **** > > JDK 8 b79 works so far very well on Linux, we found some strange**** > > behavior in early versions (maybe compiler errors), but no longer**** > > at the moment. There is one configuration that constantly and**** > > reproducibly hangs in one module that is tested: The configuration**** > > uses JDK 8 b79 (same for b78), 32 bit, and G1GC (server or client**** > > does not matter). The JVM running the tests hangs irresponsible**** > > (jstack or kill -3 have no effect/cannot connect, standard kill**** > > does not stop it, only kill -9 actually kills it). It can be**** > > reproduced in this Lucene module 100% (it hangs always).**** > > ** ** > > I was able to connect with GDB to the JVM and get a stack trace on**** > > all threads (see attachment, dump.txt). As you see all threads of**** > > G1GC seem to hang in a syscall (os:park(), a conditional wait in**** > > pthread library). Unfortunately that’s all I can give you. A Java**** > > stacktrace is not possible because the JVM reacts on neither kill**** > > -3 nor jstack. With all other garbage collectors it passes the**** > > test without hangs in a few seconds, with 32 bit G1GC it can stand**** > > still for hours. The 64 bit JVM passes with G1GC, so only the 32**** > > bit variant is affected. Client or Server VM makes no difference.**** > > ** ** > > To reproduce:**** > > - Use a 32 bit JDK 8 b78 or b79 (tested on Linux 64 bit, but this**** > > should not matter)**** > > - Download Lucene Source code (e.g. the snapshot version we were**** > > testing with:**** > > https://builds.apache.org/job/Lucene-Artifacts-**** > > **** > > trunk/2212/artifact/lucene/dist/)**** > > **** > > - change to directory lucene/analysis/uima and run:**** > > ant -Dargs="-server -XX:+UseG1GC" -Dtests.multiplier=3**** > > -Dtests.jvms=1 test**** > > After a while the test framework prints "stalled" messages**** > > (because the child VM actually running the test no longer**** > > responds). The PID is also printed. Try to get a stack trace or kill it, > no**** > > **** > > response.**** > > **** > > Only kill -9 helps. Choosing another garbage collector in the**** > > above command line makes the test finish after a few seconds, e.g.**** > > -Dargs="-server -XX:+UseConcMarkSweepGC"**** > > ** ** > > I posted this bug report directly to the mailing list, because**** > > with earlier bug reports, there seem to be a problem with**** > > bugs.sun.com - there is no response from any reviewer after**** > > several weeks and we were able to help to find and fix javadoc and**** > > javac-compiler bugs early. So I hope you can help for this bug, too.**** > > ** ** > > Uwe**** > > ** ** > > -----**** > > Uwe Schindler**** > > uschind...@apache.org**** > > Apache Lucene PMC Member / Committer Bremen, Germany**** > > http://lucene.apache.org/**** > > ** ** > > ** ** > > **** > > ** ** > > **** > > ** ** > > >