Awesome work all!

On Thu, Mar 7, 2013 at 7:24 AM, Bengt Rutisson <bengt.rutis...@oracle.com>wrote:

>
> John and Uwe,
>
> I followed the original instruction sent out by Uwe to reproduce the test.
> I got it up and running on my Windows x64 workstation using a 32 bit
> binary. The test hangs every time I run it.
>
> John, I think your proxy issues are due to the fact that ant picks up its
> proxy setting from Java. So you need to set the system properties
> http.proxyHost and http.proxyPort. I did this by exporting the the
> _JAVA_OPTIONS environment variable as:
>
> _JAVA_OPTIONS=-Dhttp.proxyHost=<oracle www proxy> -Dhttp.proxyPort=<oracle
> proxy port>
>
> Let me know if this does not work for you. We can try to debug it offline.
>
> Since I could catch the hang in a debugger I could confirm both that the
> hang is indeed related to the recent change to the
> DrainMarkingStackClosures and that the problem is that we enter the
> termination protocol even when reference processing is single threaded.
>
> Looking at the comment in the constructor for G1CMDrainMarkingStackClosure:
>
>     // We only allow stealing and only enter the termination protocol
>     // in CMTask::do_marking_step() if this closure is being instantiated
>     // for parallel reference processing.
>     _do_stealing = _do_termination = is_par;
>
> I came up with a patch that makes the test work again. But I leave it to
> you, John, to figure out if this is the right way to solve the problem.
>
> diff --git a/src/share/vm/gc_implementation/g1/concurrentMark.cpp
> b/src/share/vm/gc_implementation/g1/concurrentMark.cpp
> --- a/src/share/vm/gc_implementation/g1/concurrentMark.cpp
> +++ b/src/share/vm/gc_implementation/g1/concurrentMark.cpp
> @@ -4336,7 +4336,9 @@
>          gclog_or_tty->print_cr("[%u] detected overflow", _worker_id);
>        }
>
> + if (do_stealing || do_termination) {
>        _cm->enter_first_sync_barrier(_worker_id);
> + }
>        // When we exit this sync barrier we know that all tasks have
>        // stopped doing marking work. So, it's now safe to
>        // re-initialise our data structures. At the end of this method,
> @@ -4347,8 +4349,10 @@
>        // We clear the local state of this task...
>        clear_region_fields();
>
> + if (do_stealing || do_termination) {
>        // ...and enter the second barrier.
>        _cm->enter_second_sync_barrier(_worker_id);
> + }
>        // At this point everything has bee re-initialised and we're
>        // ready to restart.
>      }
>
>
> Thanks,
> Bengt
>
>
> On 3/7/13 7:44 AM, Uwe Schindler wrote:
>
>  Hi John,****
>
> ** **
>
> I only have time to work on a setup this evening Germen time, because I am
> on a business trip today. Will come back to you. Unfortunately I failed to
> quickly setup an easy classpath without Ivy downloading the JARS. ****
>
> ** **
>
> Uwe****
>
> ** **
>
> -----****
>
> Uwe Schindler****
>
> uschind...@apache.org ****
>
> Apache Lucene PMC Member / Committer****
>
> Bremen, Germany****
>
> http://lucene.apache.org/****
>
> ** **
>
> *From:* John Cuthbertson 
> [mailto:john.cuthbert...@oracle.com<john.cuthbert...@oracle.com>]
>
> *Sent:* Thursday, March 07, 2013 12:49 AM
> *To:* Uwe Schindler
> *Cc:* 'Bengt Rutisson'; hotspot-gc-...@openjdk.java.net;
> dev@lucene.apache.org
> *Subject:* Re: JVM hanging when using G1GC on JDK8 b78 or b79 (Linux 32
> bit)****
>
> ** **
>
> Hi Uwe,
>
> An update:
>
> I have downloaded ant and the lucerne source.
>
> I attempted the ivy-bootstrap but it failed to download the ivy=2.3.0.jar
> file - even after setting:
>
> ANT_OPTS=-Dhttp.proxyHost=<...> -Dhttp.proxyPort=<...>
>
> So I manually downloaded and placed it into the ANT library and now get:
>
>
> ****
>
> ivy-bootstrap1:
>     [mkdir] Skipping /home/jcuthber/.ant/lib because it already exists.
>      [echo] installing ivy 2.3.0 to /home/jcuthber/.ant/lib
>       [get] Getting:
> http://repo1.maven.org/maven2/org/apache/ivy/ivy/2.3.0/ivy-2.3.0.jar
>       [get] To: /home/jcuthber/.ant/lib/ivy-2.3.0.jar
>       [get] Error getting
> http://repo1.maven.org/maven2/org/apache/ivy/ivy/2.3.0/ivy-2.3.0.jar to
> /home/jcuthber/.ant/lib/ivy-2.3.0.jar
> [available] Found: /home/jcuthber/.ant/lib/ivy-2.3.0.jar
>
> ivy-bootstrap2:
> Skipped because property 'ivy.bootstrap1.success' set.
>
> ivy-checksum:
>
> ivy-bootstrap:
>
> BUILD SUCCESSFUL
> Total time: 3 minutes 46 seconds****
>
> Presumably I have to build the lucerne source before executing the tests.
> That seemed to go OK.
>
> When I run the analysis/uima tests it seems to get hung up at the
> "resolve" target - even without specifying G1:
>
>
> ****
>
> cairnapple{jcuthber}:408> cd analysis/uima/
> cairnapple{jcuthber}:409> ls -l
> total 29
> -rw-r--r--   1 jcuthber staff       1473 Dec 10 10:39 build.xml
> -rw-rw-r--   1 jcuthber staff       6895 Mar  6 15:20 hotspot.log
> -rw-r--r--   1 jcuthber staff       1316 Mar 30  2012 ivy.xml
> drwxr-xr-x   2 jcuthber staff          2 Mar  5 07:42 lib/
> drwxr-xr-x   6 jcuthber staff          6 Mar  5 07:42 src/****
>
>
>
> ****
>
> ivy-configure:
> [ivy:configure] Loading
> jar:file:/home/jcuthber/.ant/lib/ivy-2.3.0.jar!/org/apache/ivy/core/settings/ivy.properties
> [ivy:configure] :: Apache Ivy 2.3.0 - 20130110142753 ::
> http://ant.apache.org/ivy/ ::
> [ivy:configure] jakarta commons httpclient not found: using jdk url
> handling
> [ivy:configure] :: loading settings :: file =
> /export/bugs/8009536/lucene-5.0-2013-03-05_15-37-06/ivy-settings.xml
> [ivy:configure] no default ivy user dir defined: set to
> /home/jcuthber/.ivy2
> [ivy:configure] including url:
> jar:file:/home/jcuthber/.ant/lib/ivy-2.3.0.jar!/org/apache/ivy/core/settings/ivysettings-public.xml
> [ivy:configure] no default cache defined: set to /home/jcuthber/.ivy2/cache
> [ivy:configure] including url:
> jar:file:/home/jcuthber/.ant/lib/ivy-2.3.0.jar!/org/apache/ivy/core/settings/ivysettings-shared.xml
> [ivy:configure] including url:
> jar:file:/home/jcuthber/.ant/lib/ivy-2.3.0.jar!/org/apache/ivy/core/settings/ivysettings-local.xml
> [ivy:configure] including url:
> jar:file:/home/jcuthber/.ant/lib/ivy-2.3.0.jar!/org/apache/ivy/core/settings/ivysettings-main-chain.xml
> [ivy:configure] settings loaded (289ms)
> [ivy:configure]         default cache: /home/jcuthber/.ivy2/cache
> [ivy:configure]         default resolver: default
> [ivy:configure]         -- 7 resolvers:
> [ivy:configure]         working-chinese-mirror [ibiblio]
> [ivy:configure]         main [chain] [shared, public]
> [ivy:configure]         local [file]
> [ivy:configure]         shared [file]
> [ivy:configure]         sonatype-releases [ibiblio]
> [ivy:configure]         public [ibiblio]
> [ivy:configure]         default [chain] [local, main, sonatype-releases,
> working-chinese-mirror]
>
> resolve:
> [ivy:retrieve] no resolved descriptor found: launching default resolve
> Overriding previous definition of property "ivy.version"
> [ivy:retrieve] using ivy parser to parse
> file:/export/bugs/8009536/lucene-5.0-2013-03-05_15-37-06/analysis/uima/ivy.xml
> [ivy:retrieve] :: resolving dependencies ::
> org.apache.lucene#analyzers-uima;working@cairnapple
> [ivy:retrieve]  confs: [default]
> [ivy:retrieve]  validate = true
> [ivy:retrieve]  refresh = false
> [ivy:retrieve] resolving dependencies for configuration 'default'
> [ivy:retrieve] == resolving dependencies for
> org.apache.lucene#analyzers-uima;working@cairnapple [default]
> [ivy:retrieve] == resolving dependencies
> org.apache.lucene#analyzers-uima;working@cairnapple->org.apache.uima#Tagger;2.3.1
> [default->*]
> [ivy:retrieve] default: Checking cache for: dependency:
> org.apache.uima#Tagger;2.3.1 {*=[*]}
> [ivy:retrieve] don't use cache for org.apache.uima#Tagger;2.3.1:
> checkModified=true
> [ivy:retrieve]          tried
> /home/jcuthber/.ivy2/local/org.apache.uima/Tagger/2.3.1/ivys/ivy.xml
> [ivy:retrieve]          tried
> /home/jcuthber/.ivy2/local/org.apache.uima/Tagger/2.3.1/jars/Tagger.jar
> [ivy:retrieve]  local: no ivy file nor artifact found for
> org.apache.uima#Tagger;2.3.1
> [ivy:retrieve] main: Checking cache for: dependency:
> org.apache.uima#Tagger;2.3.1 {*=[*]}
> [ivy:retrieve]          tried
> /home/jcuthber/.ivy2/shared/org.apache.uima/Tagger/2.3.1/ivys/ivy.xml
> [ivy:retrieve]          tried
> /home/jcuthber/.ivy2/shared/org.apache.uima/Tagger/2.3.1/jars/Tagger.jar
> [ivy:retrieve]  shared: no ivy file nor artifact found for
> org.apache.uima#Tagger;2.3.1
> [ivy:retrieve]          tried
> http://repo1.maven.org/maven2/org/apache/uima/Tagger/2.3.1/Tagger-2.3.1.pom
> ****
>
> and there it hangs - presumably trying to access
> http://repo1.maven.org/maven2/org/apache/uima/Tagger/2.3.1/Tagger-2.3.1.pom
>
> There must be something with our proxy settings that that won't allow this.
>
> JohnC
>
>
> On 03/06/13 11:15, Uwe Schindler wrote: ****
>
> Hi,****
>
> ** **
>
> That's unfortunately not so easy, because of project dependencies. To run the 
> test you have to compile Lucene Core then the specific module + the test 
> framework (which is special for Lucene) and download some JARs from Maven 
> central (JAR hell, as usual).****
>
> If you give me some time, I would collect all needed JAR files from my local 
> checkout and provide you the correct cmd line + a ZIP file with maybe a shell 
> script to startup. It should be doable, but needs some work to collect all 
> dependencies for the classpath.****
>
> ** **
>
> If you want to do it quicker (should be quite fast to do):****
>
> - Download ANT 1.8.2 binary zip (unfortunately ANT 1.8.4 has a bug making it 
> not working out of the box with Java 8): 
> http://archive.apache.org/dist/ant/binaries/apache-ant-1.8.2-bin.tar.gz - I 
> just wonder about the fact: isn't ANT needed to build the JDK classlib by 
> itself? I remember that the FreeBSD OpenJDK build downloads ANT and does a 
> large part of the compilation using ANT...****
>
> - put the ANT bin/ dir into your PATH****
>
> - download the Apache Lucene source code from Jenkins: 
> https://builds.apache.org/job/Lucene-Artifacts-trunk/2212/artifact/lucene/dist/lucene-5.0-2013-03-05_15-37-06-src.tgz****
>
> - go to extracted lucene source dir, call "ant ivy-bootstrap" (this will 
> download Apache IVY, so all dependencies can be downloaded from Maven 
> Central)****
>
> - change to the module that fails: # cd analysis/uima****
>
> - execute: # ant -Dargs="-server -XX:+UseG1GC" -Dtests.multiplier=3 
> -Dtests.jvms=1 test****
>
> - In a parallel console you might be able to attach to the process, the build 
> in the main console using ANT runs inside ANT and the test framework spawns 
> separate worker instances of the JVM to execute the tests. This makes it hard 
> to reproduce in standalone (the command line passed to the child JVM is 
> veeeeery long).****
>
> ** **
>
> I will work on putting together a precompiled ZIP file with all needed JARs + 
> the command line. Just tell me if you got it managed with the above howto, 
> then I don’t need to do this.****
>
> Uwe****
>
> ** **
>
> -----****
>
> Uwe Schindler****
>
> uschind...@apache.org ****
>
> Apache Lucene PMC Member / Committer****
>
> Bremen, Germany****
>
> http://lucene.apache.org/****
>
> ** **
>
> ** **
>
>   ****
>
>  -----Original Message-----****
>
> From: John Cuthbertson [mailto:john.cuthbert...@oracle.com 
> <john.cuthbert...@oracle.com>]****
>
> Sent: Wednesday, March 06, 2013 7:51 PM****
>
> To: Uwe Schindler****
>
> Cc: 'Bengt Rutisson'; hotspot-gc-...@openjdk.java.net;****
>
> dev@lucene.apache.org****
>
> Subject: Re: JVM hanging when using G1GC on JDK8 b78 or b79 (Linux 32 bit)****
>
> ** **
>
> Hi Uwe,****
>
> ** **
>
> I've downloaded  lucene-5.0-2013-03-05_15-37-06.zip from****
>
> https://builds.apache.org/job/Lucene-Artifacts-****
>
> trunk/2212/artifact/lucene/dist/****
>
> ** **
>
> I don't have ant on my workstation so do you have a java command line to****
>
> run the test(s) that generate the error?****
>
> ** **
>
> Thanks,****
>
> ** **
>
> JohnC****
>
> ** **
>
> On 3/6/2013 3:16 AM, Uwe Schindler wrote:****
>
>     ****
>
>  Hi,****
>
> ** **
>
>       ****
>
>  I think this is a VM bug and the thread dumps that Uwe produced are****
>
> enough to start tracking down the root cause.****
>
>         ****
>
>  I hope it is enough! If I can help with more details, tell me what I should 
> do****
>
>       ****
>
>  to track this down. Unfortunately, we have no isolated test case (like a 
> small****
>
> java class that triggers this bug) - you have to run the test cases of 
> this****
>
> Lucene's module. It only happens there, not in any other Lucene test suite. 
> It****
>
> may be caused by a lot of GC activity in this "UIMA" module or a specific 
> test.****
>
>     ****
>
>  On 3/6/13 8:52 AM, David Holmes wrote:****
>
>         ****
>
>  If the VM is completely unresponsive then it suggests we are at a****
>
> safepoint.****
>
>           ****
>
>  Yes, we are hanging during a stop-the-world GC, so we are at a safepoint.****
>
> ** **
>
>         ****
>
>  The GC threads are not "hung" in os::parK, they are parked - waiting****
>
> to be notified of something.****
>
>           ****
>
>  It looks like the reference processing thread is stuck in a loop****
>
> where it does wait(). So, the VM is hanging even if that stack trace****
>
> also ends up in os::park().****
>
> ** **
>
>         ****
>
>  The thing is to find out why they are not being woken up.****
>
>           ****
>
>  Actually, in this case we should probably not even be calling wait...****
>
> ** **
>
>         ****
>
>  Can the gdb log be posted somewhere? I don't know if the attachment****
>
> made it to the original posting on hotspot-gc but it's no longer****
>
> available on hotspot-dev.****
>
>           ****
>
>  I received the attachment with the original email. I've attached it****
>
> to the bug report that I created: 8009536. You can find it there if****
>
> you want to. But I think we have a fairly good idea of what change****
>
> caused the hang.****
>
>         ****
>
>  If it helps: Unfortunately, we had some problems with recent JDK builds,****
>
>       ****
>
>  because javac and javadoc tools were not working correctly, failing to 
> build****
>
> our source code. Since b78 this was fixed. Until this was fixed, we used 
> build****
>
> b65 (which was the last one working) and the G1GC hangs did not appear on****
>
> this version. So it must have happened by a change after b65 till b78.****
>
>     ****
>
>  Uwe****
>
> ** **
>
>       ****
>
>  Bengt****
>
> ** **
>
>         ****
>
>  Thanks,****
>
> David****
>
> ** **
>
> On 6/03/2013 4:07 PM, Krystal Mok wrote:****
>
>           ****
>
>  Hi Uwe,****
>
> ** **
>
> If you can attach gdb onto it, and jstack -m and jstack -F should****
>
> also work; that'll get you the Java stack trace.****
>
> (But it probably doesn't matter in this case, because the hang is****
>
> probably bug in the VM).****
>
> ** **
>
> - Kris****
>
> ** **
>
> On Wed, Mar 6, 2013 at 5:48 AM, Uwe Schindler****
>
>             ****
>
>  <uschind...@apache.org> <uschind...@apache.org>****
>
>         ****
>
>  wrote:****
>
>             ****
>
>  Hi,****
>
> ** **
>
> since a few month we are extensively testing various preview****
>
> builds of JDK 8 for compatibility with Apache Lucene and Solr, so****
>
> we can find any bugs early and prevent the problems we had with****
>
> the release of Java 7 two years ago. Currently we have a Linux****
>
> (Ubuntu 64bit) Jenkins machine that has various JDKs (JDK 6, JDK****
>
> 7, JDK 8 snapshot, IBM J9, older JRockit) installed, choosing a****
>
> different one with different hotspot and garbage collector****
>
> settings on every run of the test suite (which takes approx. 30-45****
>
>               ****
>
>    minutes).****
>
>     ****
>
>    JDK 8 b79 works so far very well on Linux, we found some strange****
>
> behavior in early versions (maybe compiler errors), but no longer****
>
> at the moment. There is one configuration that constantly and****
>
> reproducibly hangs in one module that is tested: The configuration****
>
> uses JDK 8 b79 (same for b78), 32 bit, and G1GC (server or client****
>
> does not matter). The JVM running the tests hangs irresponsible****
>
> (jstack or kill -3 have no effect/cannot connect, standard kill****
>
> does not stop it, only kill -9 actually kills it). It can be****
>
> reproduced in this Lucene module 100% (it hangs always).****
>
> ** **
>
> I was able to connect with GDB to the JVM and get a stack trace on****
>
> all threads (see attachment, dump.txt). As you see all threads of****
>
> G1GC seem to hang in a syscall (os:park(), a conditional wait in****
>
> pthread library). Unfortunately that’s all I can give you. A Java****
>
> stacktrace is not possible because the JVM reacts on neither kill****
>
> -3 nor jstack. With all other garbage collectors it passes the****
>
> test without hangs in a few seconds, with 32 bit G1GC it can stand****
>
> still for hours. The 64 bit JVM passes with G1GC, so only the 32****
>
> bit variant is affected. Client or Server VM makes no difference.****
>
> ** **
>
> To reproduce:****
>
> - Use a 32 bit JDK 8 b78 or b79 (tested on Linux 64 bit, but this****
>
> should not matter)****
>
> - Download Lucene Source code (e.g. the snapshot version we were****
>
> testing with:****
>
> https://builds.apache.org/job/Lucene-Artifacts-****
>
>               ****
>
>   trunk/2212/artifact/lucene/dist/)****
>
>         ****
>
>   - change to directory lucene/analysis/uima and run:****
>
>           ant -Dargs="-server -XX:+UseG1GC" -Dtests.multiplier=3****
>
> -Dtests.jvms=1 test****
>
> After a while the test framework prints "stalled" messages****
>
> (because the child VM actually running the test no longer****
>
> responds). The PID is also printed. Try to get a stack trace or kill it, 
> no****
>
>               ****
>
>    response.****
>
>     ****
>
>    Only kill -9 helps. Choosing another garbage collector in the****
>
> above command line makes the test finish after a few seconds, e.g.****
>
> -Dargs="-server -XX:+UseConcMarkSweepGC"****
>
> ** **
>
> I posted this bug report directly to the mailing list, because****
>
> with earlier bug reports, there seem to be a problem with****
>
> bugs.sun.com - there is no response from any reviewer after****
>
> several weeks and we were able to help to find and fix javadoc and****
>
> javac-compiler bugs early. So I hope you can help for this bug, too.****
>
> ** **
>
> Uwe****
>
> ** **
>
> -----****
>
> Uwe Schindler****
>
> uschind...@apache.org****
>
> Apache Lucene PMC Member / Committer Bremen, Germany****
>
> http://lucene.apache.org/****
>
> ** **
>
> ** **
>
>               ****
>
>    ** **
>
>   ****
>
> ** **
>
>
>

Reply via email to