So, tried to run with IHOP=55 today, and thinks don't look better - lengthy Object Copy times, and lengthy Finalize Marking phases. E.g.:
2016-08-25T11:23:43.191+0000: 3509.335: [GC pause (GCLocker Initiated GC) (young) Desired survivor size 1795162112 bytes, new threshold 2 (max 2) - age 1: 1793677792 bytes, 1793677792 total - age 2: 641809160 bytes, 2435486952 total 3509.335: [G1Ergonomics (CSet Construction) start choosing CSet, _pending_cards: 152524, predicted base time: 45.68 ms, remaining time: 254.32 ms, target pause time: 300.00 ms] 3509.335: [G1Ergonomics (CSet Construction) add young regions to CSet, eden: 872 regions, survivors: 89 regions, predicted young region time: 20434.47 ms] 3509.335: [G1Ergonomics (CSet Construction) finish choosing CSet, eden: 872 regions, survivors: 89 regions, old: 0 regions, predicted pause time: 20480.15 ms, target pause time: 300.00 ms] 2016-08-25T11:24:09.459+0000: 3535.603: [SoftReference, 0 refs, 0.0011022 secs]2016-08-25T11:24:09.460+0000: 3535.604: [WeakReference, 1 refs, 0.0006259 secs]2016-08-25T11:24:09.461+0000: 3535.605: [FinalReference, 2131 refs, 0.0008182 secs]2016-08-25T11:24:09.462+0000: 3535.606: [PhantomReference, 0 refs, 0 refs, 0.0013352 secs]2016-08-25T11:24:09.463+0000: 3535.607: [JNI Weak Reference, 0.0000145 secs] 3535.621: [G1Ergonomics (Heap Sizing) attempt heap expansion, reason: recent GC overhead higher than threshold after GC, recent GC overhead: 47.64 %, threshold: 10.00 %, uncommitted: 0 bytes, calculated expansion amount: 0 bytes (20.00 %)] , 26.2863292 secs] [Parallel Time: 26267.1 ms, GC Workers: 23] [GC Worker Start (ms): Min: 3509335.4, Avg: 3509335.5, Max: 3509335.7, Diff: 0.2] [Ext Root Scanning (ms): Min: 0.9, Avg: 1.2, Max: 2.6, Diff: 1.7, Sum: 26.5] [Update RS (ms): Min: 22.7, Avg: 23.7, Max: 24.4, Diff: 1.7, Sum: 546.2] [Processed Buffers: Min: 22, Avg: 34.3, Max: 54, Diff: 32, Sum: 788] [Scan RS (ms): Min: 58.0, Avg: 59.5, Max: 60.3, Diff: 2.3, Sum: 1368.7] [Code Root Scanning (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.2] [Object Copy (ms): Min: 22450.6, Avg: 22615.0, Max: 22768.2, Diff: 317.5, Sum: 520143.9] [Termination (ms): Min: 3413.9, Avg: 3567.2, Max: 3732.4, Diff: 318.5, Sum: 82045.7] [Termination Attempts: Min: 601032, Avg: 630093.2, Max: 639709, Diff: 38677, Sum: 14492144] [GC Worker Other (ms): Min: 0.0, Avg: 0.1, Max: 0.3, Diff: 0.2, Sum: 3.3] [GC Worker Total (ms): Min: 26266.6, Avg: 26266.8, Max: 26266.9, Diff: 0.3, Sum: 604136.0] [GC Worker End (ms): Min: 3535602.2, Avg: 3535602.3, Max: 3535602.4, Diff: 0.2] [Code Root Fixup: 0.1 ms] [Code Root Purge: 0.0 ms] [Clear CT: 5.3 ms] [Other: 13.8 ms] [Choose CSet: 0.0 ms] [Ref Proc: 4.3 ms] [Ref Enq: 0.2 ms] [Redirty Cards: 4.5 ms] [Humongous Register: 0.1 ms] [Humongous Reclaim: 0.1 ms] [Free CSet: 3.4 ms] [Eden: 27.2G(27.2G)->0.0B(27.2G) Survivors: 2848.0M->2880.0M Heap: 83.0G(96.0G)->56.5G(96.0G)] [Times: user=603.54 sys=0.02, real=26.29 secs] 2016-08-25T11:24:09.477+0000: 3535.621: Total time for which application threads were stopped: 26.2879077 seconds, Stopping threads took: 0.0002478 seconds 2016-08-25T11:24:10.100+0000: 3536.244: [GC concurrent-mark-end, 43.5578776 secs] 2016-08-25T11:24:10.103+0000: 3536.247: [GC remark 2016-08-25T11:24:10.103+0000: 3536.247: [Finalize Marking, 85.4564628 secs] 2016-08-25T11:25:35.559+0000: 3621.703: [GC ref-proc2016-08-25T11:25:35.559+0000: 3621.703: [SoftReference, 0 refs, 0.0011470 secs]2016-08-25T11:25:35.561+0000: 3621.705: [WeakReference, 445 refs, 0.0008184 secs]2016-08-25T11:25:35.561+0000: 3621.705: [FinalReference, 206 refs, 0.0008055 secs]2016-08-25T11:25:35.562+0000: 3621.706: [PhantomReference, 0 refs, 48 refs, 0.0015741 secs]2016-08-25T11:25:35.564+0000: 3621.708: [JNI Weak Reference, 0.0000350 secs], 0.0047317 secs] 2016-08-25T11:25:35.564+0000: 3621.708: [Unloading, 0.0163349 secs], 85.5620546 secs] [Times: user=1373.18 sys=588.01, real=85.57 secs] Jenny, I'll send you the full log offline. Thanks On Thu, Aug 25, 2016 at 7:15 AM, Vitaly Davidovich <vita...@gmail.com> wrote: > Hi Jenny, > > Thanks very much for your analysis. A few comments inline below. > > > On Wednesday, August 24, 2016, yu.zh...@oracle.com <yu.zh...@oracle.com> > wrote: > >> Vitaly, >> >> Thanks for the gc logs. >> >> Before reaching full gc and the entry at >> >> 2016-08-24T15:29:12.302+0000: 17776.029: [GC pause (G1 Evacuation Pause) >> (young) >> >> ... >> >> [Eden: 0.0B(30.0G)->0.0B(30.0G) Survivors: 0.0B->0.0B Heap: >> 95.2G(96.0G)->95.2G(96.0G)] >> There are several young gcs with 'to-space exhausted'. For example >> >> "2016-08-24T15:29:03.936+0000: 17767.663: [GC pause (G1 Evacuation Pause) >> (young) >> Desired survivor size 1795162112 bytes, new threshold 2 (max 2) >> 17767.663: [G1Ergonomics (CSet Construction) start choosing CSet, >> _pending_cards: 185708, predicted base time: 46.71 ms, remaining time: >> 253.29 ms, target pause time: 300.00 ms] >> 17767.663: [G1Ergonomics (CSet Construction) add young regions to CSet, >> eden: 159 regions, survivors: 0 regions, predicted young region time: >> 173.05 ms] >> 17767.663: [G1Ergonomics (CSet Construction) finish choosing CSet, eden: >> 159 regions, survivors: 0 regions, old: 0 regions, predicted pause time: >> 219.76 ms, target pause time: 300.00 ms] >> 17767.664: [G1Ergonomics (Heap Sizing) attempt heap expansion, reason: >> region allocation request failed, allocation request: 16340088 bytes] >> 17767.664: [G1Ergonomics (Heap Sizing) expand the heap, requested >> expansion amount: 16340088 bytes, attempted expansion amount: 33554432 >> bytes] >> 17767.664: [G1Ergonomics (Heap Sizing) did not expand the heap, reason: >> heap already fully expanded] >> ... >> >> (to-space exhausted), 8.0293588 secs] >> [Parallel Time: 6920.2 ms, GC Workers: 23] >> ... >> [Other: 1106.9 ms] >> [Evacuation Failure: 1095.6 ms] >> [Choose CSet: 0.0 ms] >> [Ref Proc: 4.4 ms] >> [Ref Enq: 0.4 ms] >> [Redirty Cards: 4.8 ms] >> [Humongous Register: 0.2 ms] >> [Humongous Reclaim: 0.2 ms] >> [Free CSet: 0.3 ms] >> [Eden: 5088.0M(30.0G)->0.0B(30.0G) Survivors: 0.0B->0.0B Heap: >> 95.2G(96.0G)->95.2G(96.0G)] >> [Times: user=68.75 sys=1.66, real=8.03 secs] >> >> Please note the high "Evacuation Failure" time. Some of those young gcs >> might collect some. Some (like this one) do not collect any heap. >> >> (to-space exhausted) means it can not find enough old regions when >> evacuating from the young regions. >> > The above log also says that Eden occupancy went from ~5G to 0. Given the > to-space exhaustion and Survivors being 0, that implies it was all > garbage. If that's the case, what exactly did it fail to evacuate? If it > failed to evacuate them, where did they go given Eden + Survivors are 0. > Also, Survivors was apparently 0 to start with, so it's unclear to me what > exactly happened here. > >> >> >> There are only 4 marking cycles, 3 of which are at the beginning due to >> meta data space. Mix gc is not trigger at that time since the heap is not >> full enough. >> >> There is only 1 marking cycle requested due to heap occupancy, but it is >> too late. The heap is very full already. No mixed gc found in the log. >> "request concurrent cycle initiation, reason: occupancy higher than >> threshold, occupancy: 88080384000 bytes, allocation request: 0 bytes, >> threshold: 77309411325 bytes (75.00 %), source: end of GC]" >> > Any ideas why the concurrent cycle is initiated when occupancy is already > well above the 75% threshold? Why wouldn't it trigger at very close to the > threshold? It's "late" by about 10G here. > >> >> >> To avoid this, as others suggested, you can try >> 1. set a reasonable pause time goal and let g1 decide the young gen size >> > Is there a rule of thumb for picking reasonable pause time goals based on > heap size/alloc rate/etc? > >> >> 2. if you have to set the young gen size fixed, reduce the NewSize, >> increase max tenure threshold to let objects die in young gen. >> 3. Reduce InitiatingHeapOccupancyPercent >> The goal is to trigger more mixed gc and avoid to-space exhausted. >> > Right, we're trying with a lower IHOP value (55%). > >> >> >> The message after 'to-space exhausted' might be confusing. I need to >> discuss this with dev team. For example, at time stamp >> 2016-08-24T15:28:05.905+0000: 17709.633: [GC pause (G1 Evacuation Pause) >> (young) >> ... >> (to-space exhausted), 2.6149566 secs] >> ... >> >> [Eden: 28.4G(28.4G)->0.0B(28.4G) Survivors: 1664.0M->1664.0M Heap: >> 93.5G(96.0G)->73.9G(96.0G)] >> >> the eden used after gc might not be true. I will do some investigation >> and get back to you. >> > Thanks. Yeah it's confusing. I'm still not sure I understand the log > snippet I pasted in my initial email of the young evac immediately > preceding the Full GC - it showed 0 regions in the CSet, so nothing was > evacuated, but it also showed Eden occupancy of 0. It's then unclear why > the Full GC triggers immediately after due to a 32 byte alloc request. > > Do you think that may be a bogus log as well? > >> >> >> Thanks >> Jenny >> > Thanks again Jenny, this is very helpful. > >> >> >> On 08/24/2016 03:50 PM, Vitaly Davidovich wrote: >> >> >> >> On Wed, Aug 24, 2016 at 6:18 PM, Jenny Zhang <yu.zh...@oracle.com> wrote: >> >>> More comments about the questions >>> >>> Thanks >>> Jenny >>> >>> On 8/24/2016 11:43 AM, Vitaly Davidovich wrote: >>> >>>> Right before the Full GC, ergonomics report a failure to expand the >>>> heap due to an allocation request of 32 bytes. Is this implying that a >>>> mutator tried to allocate 32 bytes but couldn't? How do I reconcile that >>>> with Eden+Survivor occupancy reported right above that? >>>> >>> Yes, it means the mutator tries to allocate 32byte but can not get it. >>> Heap won't expand as it already reaches max heap. >>> >>> Do you see any humongous objects allocatoin? >> >> As mentioned in my previous email, I don't see any humongous allocations >> recorded. The Humongous Register/Reclaim output does show non-zero timing, >> so not sure if the log is simply missing them or something else is going on. >> >>> >>> >>>> Young gen is sized to 30GB, total heap is 96GB. Allocation rate of the >>>> application is roughly 1GB/s. Am I correct in assuming that allocation is >>>> outpacing concurrent marking, based on the above? What tunable(s) would you >>>> advise to tweak to get G1 to keep up with the allocation rate? I'm ok >>>> taking some throughput hit to mitigate 90s+ pauses. >>>> >>>> The entire log might give a better picture. Especially if the marking >>> cycle is triggered, how well the mixed gc cleans up the heap. >>> >>> Are there particular gc lines you're looking for? I can grep for them >> quickly and provide that for you. >> >> > > > > > > > > > > -- > Sent from my phone >
_______________________________________________ hotspot-gc-use mailing list hotspot-gc-use@openjdk.java.net http://mail.openjdk.java.net/mailman/listinfo/hotspot-gc-use