Yep, the marking cycle seems to help. I just don't know why. Objects in old regions should die very infrequently, as everything produced either survives indefinitely or is a byproduct of loading or evaluation (a very fast operation,especially when compared to frequency of evac pause * number of survivor regions). Thus a mark cycle shouldn't reveal much to be collected in old regions, and my understanding is that all the survivor spaces are marked+evac'd on each evac pause.
Tried first with 12 workers (happens to be the number of physical cores on my machine) and got the same pathological behavior. Then tried with 2 and still see large termination time increases. Log file attached. William On Wed, Aug 31, 2016 at 8:18 AM, yu.zh...@oracle.com <yu.zh...@oracle.com> wrote: > It seems that after marking (clean up), the termination time drops. Maybe > that is why you need a very low ihop so that you can have more marking > cycle. > > The work distribution seems fine. But system time is high. Maybe some lock > contention. > > I would agree to try lowering the gc threads, -XX:ParallelGCThreads=<n> > > Jenny > > > On 08/30/2016 04:08 PM, Vitaly Davidovich wrote: > > William, > > Have you tried running with a lower number (than the current 18) of parallel > workers? > > On Tuesday, August 30, 2016, William Good <bkg...@gmail.com> wrote: >> >> I've been experiencing an issue in a production application using G1 >> for quite some time over a handful of 1.8.0 builds. The application is >> relatively simple: it spends about 60s reading some parameters from >> files on disk, and then starts serving web requests which merge some >> input with those parameters, performs some computation and returns a >> result. We're aiming to keep max total request time (as seen by remote >> hosts) below 100 ms but from previous experience with parnew and cms >> (and g1 on previous projects, for that matter), I didn't anticipate >> this being a problem. >> >> The symptoms are an ever-increasing time spent in evacuation pauses, >> and high parallel worker termination times stick out. With the >> recommended set of G1 settings (max heap size and pause time target), >> they increase sharply until I start seeing 500ms+ pause times and have >> to kill the JVM. >> >> I found some time ago that first forcing a bunch of full GCs with >> System.gc() at the phase (load -> serve) change and then forcing >> frequent concurrent cycles with -XX:InitiatingHeapOccupancyPercent=1 >> seems to mitigate the problem. I'd prefer to have to do neither, as >> the former makes redeployments very slow and the latter adds a couple >> of neighboring 40ms pauses for remark and cleanup pauses that aren't >> good for request time targets. >> >> I'm attaching a log file that details a short run, with the phase >> change at about 60s from start. After a few evacuation pauses, one >> lasts 160ms with nearly 100-120ms spent in parallel workers' >> 'termination'. After this, a concurrent cycle runs and everything goes >> back to normal. java params are at the top of the file. >> >> Generally this happens over a much longer period of time (and >> especially if I haven't given the low >> -XX:InitiatingHeapOccupancyPercent value) and over many different >> builds of 1.8.0. This was b101. It's running alone on a fairly hefty >> dual-socket Xeon box with 128GB of RAM on CentOS 7. >> >> I'd be more than happy to hear any ideas on what's going on here and >> how it could be fixed. >> >> Best, >> William > > > > -- > Sent from my phone > > > _______________________________________________ > hotspot-gc-use mailing list > hotspot-gc-use@openjdk.java.net > http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use > >
2workers.log.gz
Description: GNU Zip compressed data
_______________________________________________ hotspot-gc-use mailing list hotspot-gc-use@openjdk.java.net http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use