Thanks Dan, that solved the main issue, I no longer have OOMs on the core module. I'll merge your PR as soon as I completed the full build.
Interesting idea to disable TieredCompilation, I'll try that on other projects too. If someone is up for some additional love as follow ups: - raising the heap from 1G to ~1300M does give it quite some more breathing space, I believe it should still work on a 2GB testing machine. - I still see quite some MBeans in the JConsole at the end of the build, something is leaking these and they do keep references to CacheManagers. - still seeing an unreasonable amount of threads as well, varying from ~200 to ~2000. Possibly related to the previous point? Cheers, Sanne On 19 February 2018 at 11:57, Dan Berindei <dan.berin...@gmail.com> wrote: > Ok, so the biggest problem is that TestNG keeps test instances around until > the end of the test suite, and many of our tests are quite heavyweight > because they keep references to caches/managers even after they finish. I've > opened a PR to set those fields to null, fix some smaller leaks, and use > -XX:+UseG1GC -XX:-TieredCompilation, and I'm getting ~ 11 mins on my laptop. > > https://github.com/infinispan/infinispan/pull/5768 > > It's still a lot, especially knowing that not long ago it would take half of > that, but making it shorter would probably involve looking deeper into the > (many) tests that we've added in the last year or so. > > Cheers > Dan > > > On Fri, Feb 16, 2018 at 8:05 AM, Dan Berindei <dan.berin...@gmail.com> > wrote: >> >> Yeah, I got a much slower run with the default collector (parallel): >> >> [INFO] Total time: 17:45 min >> GC Time: 2m 43s >> Compile time: 18m 20s >> >> I'm not sure if it's really the GC affecting the compile time or there's >> another factor hiding there. But I did get a heap dump and I'm analyzing it >> now. >> >> Cheers >> Dan >> >> >> On Thu, Feb 15, 2018 at 1:59 PM, Dan Berindei <dan.berin...@gmail.com> >> wrote: >>> >>> Hmmm, I didn't notice that I was running with -XX:+UseG1GC, so perhaps >>> our test suite is a pathological case for the default collector? >>> >>> [INFO] Total time: 12:45 min >>> GC Time: 52.593s >>> Class Loader Time: 1m 26.007s >>> Compile Time: 10m 10.216s >>> >>> I'll try without -XX:+UseG1GC later. >>> >>> Cheers >>> Dan >>> >>> >>> On Thu, Feb 15, 2018 at 1:39 PM, Dan Berindei <dan.berin...@gmail.com> >>> wrote: >>>> >>>> And here I was thinking that by adding -XX:+HeapDumpOnOutOfMemoryError >>>> anyone would be able to look into OOMEs and I wouldn't have to reproduce >>>> the >>>> failures myself :) >>>> >>>> Dan >>>> >>>> >>>> On Thu, Feb 15, 2018 at 1:32 PM, William Burns <mudokon...@gmail.com> >>>> wrote: >>>>> >>>>> So I must admit I had noticed a while back that I was having some >>>>> issues with running the core test suite. Unfortunately at the time CI and >>>>> everyone else seemed to not have any issues. I just ignored it because at >>>>> the time I didn't need to run core tests. But now that Sanne pointed this >>>>> out, by increasing the heap variable in the pom.xml, I was for the first >>>>> time able to run the test suite completely. It would normally hang for an >>>>> extremely long time near the 9k-10K test completed point and never finish >>>>> for me (at least I didn't wait long enough). >>>>> >>>>> So it definitely seems there is something leaking in the test suite >>>>> causing the GC to use a ton of CPU time. >>>>> >>>>> - Will >>>>> >>>>> On Thu, Feb 15, 2018 at 5:40 AM Sanne Grinovero <sa...@infinispan.org> >>>>> wrote: >>>>>> >>>>>> Thanks Dan. >>>>>> >>>>>> Do you happen to have observed the memory trend during a build? >>>>>> >>>>>> After a couple more attempts it passed the build once, so that shows >>>>>> it's possible to pass.. but even though it's a small sample so far >>>>>> that's 1 pass vs 3 OOMs on my machine. >>>>>> >>>>>> Even the one time it successfully completed the tests I see it wasted >>>>>> ~80% of total build time doing GC runs.. it was likely very close to >>>>>> fall over, and definitely not an efficient setting for regular builds. >>>>>> Observing trends on my machine I'd guess a reasonable value to be >>>>>> around 5GB to keep builds fast, or a minimum of 1.3 GB to be able to >>>>>> complete successfully without often failing. >>>>>> >>>>>> The memory issues are worse towards the end of the testsuite, and >>>>>> steadily growing. >>>>>> >>>>>> I won't be able to investigate further as I need to urgently work on >>>>>> modules, but I noticed there are quite some MBeans according to >>>>>> JConsole. I guess it would be good to check if we're not leaking the >>>>>> MBean registration, and therefore leaking (stopped?) CacheManagers >>>>>> from there? >>>>>> >>>>>> Even near the beginning of the tests, when forcing a full GC I see >>>>>> about 400MB being "not free". That's quite a lot for some simple >>>>>> tests, no? >>>>>> >>>>>> Thanks, >>>>>> Sanne >>>>>> >>>>>> >>>>>> On 15 February 2018 at 06:51, Dan Berindei <dan.berin...@gmail.com> >>>>>> wrote: >>>>>> > forkJvmArgs used to be "-Xmx2G" before ISPN-8478. I reduced the heap >>>>>> > to 1G >>>>>> > because we were trying to run the build on agent VMs with only 4GB >>>>>> > of RAM, >>>>>> > and the 2GB heap was making the build run out of native memory. >>>>>> > >>>>>> > I've yet to see an OOME in the core tests, locally or in CI. But I >>>>>> > also >>>>>> > included -XX:+HeapDumpOnOutOfMemoryError in forkJvmArgs, so assuming >>>>>> > there's >>>>>> > a new leak it should be easy to track down in the heap dump. >>>>>> > >>>>>> > Cheers >>>>>> > Dan >>>>>> > >>>>>> > >>>>>> > On Wed, Feb 14, 2018 at 11:46 PM, Sanne Grinovero >>>>>> > <sa...@infinispan.org> >>>>>> > wrote: >>>>>> >> >>>>>> >> Hey all, >>>>>> >> >>>>>> >> I'm having OOMs running the tests of infinispan-core. >>>>>> >> >>>>>> >> Initially I thought it was related to limits and security as that's >>>>>> >> the usual suspect, but no it's really just not enough memory :) >>>>>> >> >>>>>> >> Found that the root pom.xml sets a <forkJvmArgs> property to Xmx1G >>>>>> >> for >>>>>> >> surefire; I've been observing the growth of heap usage in JConsole >>>>>> >> and >>>>>> >> it's clearly not enough. >>>>>> >> >>>>>> >> What surprises me is that - as an occasional tester - I shouldn't >>>>>> >> be >>>>>> >> the one to notice such a new requirement first. A leak which only >>>>>> >> manifests in certain conditions? >>>>>> >> >>>>>> >> What do others observe? >>>>>> >> >>>>>> >> FWIW, I'm running it with 8G heap now and it's working much better; >>>>>> >> still a couple of failures but at least they're not OOM related. >>>>>> >> >>>>>> >> Thanks, >>>>>> >> Sanne >>>>>> >> _______________________________________________ >>>>>> >> infinispan-dev mailing list >>>>>> >> infinispan-dev@lists.jboss.org >>>>>> >> https://lists.jboss.org/mailman/listinfo/infinispan-dev >>>>>> > >>>>>> > >>>>>> > >>>>>> > _______________________________________________ >>>>>> > infinispan-dev mailing list >>>>>> > infinispan-dev@lists.jboss.org >>>>>> > https://lists.jboss.org/mailman/listinfo/infinispan-dev >>>>>> _______________________________________________ >>>>>> infinispan-dev mailing list >>>>>> infinispan-dev@lists.jboss.org >>>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev >>>>> >>>>> >>>>> _______________________________________________ >>>>> infinispan-dev mailing list >>>>> infinispan-dev@lists.jboss.org >>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev >>>> >>>> >>> >> > > > _______________________________________________ > infinispan-dev mailing list > infinispan-dev@lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev _______________________________________________ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev