William,
I think it might be related to humongous objects. Can you do some
experiments
-XX:+UnlockExperimentalVMOptions -XX:+G1TraceEagerReclaimHumongousObjects
Can you remove -XX:G1LogLevel=finest?
Based on the output, can adjust the heap region size.
But I think the termination is less with 2 worker threads.
Thanks
Jenny
On 09/01/2016 08:12 AM, William Good wrote:
Yep, the marking cycle seems to help. I just don't know why. Objects
in old regions should die very infrequently, as everything produced
either survives indefinitely or is a byproduct of loading or
evaluation (a very fast operation,especially when compared to
frequency of evac pause * number of survivor regions). Thus a mark
cycle shouldn't reveal much to be collected in old regions, and my
understanding is that all the survivor spaces are marked+evac'd on
each evac pause.
Tried first with 12 workers (happens to be the number of physical
cores on my machine) and got the same pathological behavior. Then
tried with 2 and still see large termination time increases. Log file
attached.
William
On Wed, Aug 31, 2016 at 8:18 AM, [email protected]
<[email protected]> wrote:
It seems that after marking (clean up), the termination time drops. Maybe
that is why you need a very low ihop so that you can have more marking
cycle.
The work distribution seems fine. But system time is high. Maybe some lock
contention.
I would agree to try lowering the gc threads, -XX:ParallelGCThreads=<n>
Jenny
On 08/30/2016 04:08 PM, Vitaly Davidovich wrote:
William,
Have you tried running with a lower number (than the current 18) of parallel
workers?
On Tuesday, August 30, 2016, William Good <[email protected]> wrote:
I've been experiencing an issue in a production application using G1
for quite some time over a handful of 1.8.0 builds. The application is
relatively simple: it spends about 60s reading some parameters from
files on disk, and then starts serving web requests which merge some
input with those parameters, performs some computation and returns a
result. We're aiming to keep max total request time (as seen by remote
hosts) below 100 ms but from previous experience with parnew and cms
(and g1 on previous projects, for that matter), I didn't anticipate
this being a problem.
The symptoms are an ever-increasing time spent in evacuation pauses,
and high parallel worker termination times stick out. With the
recommended set of G1 settings (max heap size and pause time target),
they increase sharply until I start seeing 500ms+ pause times and have
to kill the JVM.
I found some time ago that first forcing a bunch of full GCs with
System.gc() at the phase (load -> serve) change and then forcing
frequent concurrent cycles with -XX:InitiatingHeapOccupancyPercent=1
seems to mitigate the problem. I'd prefer to have to do neither, as
the former makes redeployments very slow and the latter adds a couple
of neighboring 40ms pauses for remark and cleanup pauses that aren't
good for request time targets.
I'm attaching a log file that details a short run, with the phase
change at about 60s from start. After a few evacuation pauses, one
lasts 160ms with nearly 100-120ms spent in parallel workers'
'termination'. After this, a concurrent cycle runs and everything goes
back to normal. java params are at the top of the file.
Generally this happens over a much longer period of time (and
especially if I haven't given the low
-XX:InitiatingHeapOccupancyPercent value) and over many different
builds of 1.8.0. This was b101. It's running alone on a fairly hefty
dual-socket Xeon box with 128GB of RAM on CentOS 7.
I'd be more than happy to hear any ideas on what's going on here and
how it could be fixed.
Best,
William
--
Sent from my phone
_______________________________________________
hotspot-gc-use mailing list
[email protected]
http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use
_______________________________________________
hotspot-gc-use mailing list
[email protected]
http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use