Re: JVM with 100 seconds minor GC

Jean-Philippe BEMPEL Sun, 24 Dec 2017 02:34:58 -0800

Hi,

I already experienced this issue with ParNew Collector: pause time is 
increasing between each GC and CPU is 100% consumed by GC threads.


For me this is a bug in this collector in a very specific case.

The case can be very reproducible but unfortunately, when we encounter 
this, the situation is quite complex with a large application and very 
sensible to Young Gen sizing for example.I was not able to narrow down to 
small reproducible case.

So to workaround it you can change the size of the young gen.

When profiling with perf top this issue you are basically inside the 
Collector code. I don't have anymore the stack trace.

Regards


On Friday, December 22, 2017 at 7:29:01 AM UTC+1, Abhishek Agarwal wrote:
>
> Thank you for the pointers. We have gc logs but didn't help much. 
>
> Few corrections:
> Old gen size is ~ 20 Gigs
> Young gen size is ~ 1.5 Gigs
>
> The skew between young gen and old gen size has been fixed. Here is the 
> snippet of gc logs when time increased to ~ 200 seconds. By the way the 
> service is running on AWS. 
>
>
> 59549.736: [GC (Allocation Failure) 59549.736: [ParNew: 
> 1688278K->56721K(1887488K), 0.5518315 secs] 4474842K->3433110K(21480704K), 
> 0.5526896 secs] [Times: user=1.92 sys=0.25, real=0.55 secs]
>
> 59581.163: [GC (Allocation Failure) 59581.164: [ParNew: 
> 1734545K->43688K(1887488K), 0.4640033 secs] 5110934K->3941007K(21480704K), 
> 0.4650333 secs] [Times: user=1.59 sys=0.26, real=0.46 secs]
>
> 59617.565: [GC (Allocation Failure) 59617.565: [ParNew: 
> 1721512K->40500K(1887488K), 0.1154788 secs] 5618831K->3937818K(21480704K), 
> 0.1166312 secs] [Times: user=0.44 sys=0.00, real=0.12 secs]
>
> 59645.910: [GC (Allocation Failure) 59645.911: [ParNew: 
> 1718324K->46789K(1887488K), 0.1918905 secs] 5615642K->3944132K(21480704K), 
> 0.1930935 secs] [Times: user=0.77 sys=0.00, real=0.20 secs]
>
> 59692.159: [GC (Allocation Failure) 59692.159: [ParNew: 
> 1724613K->45533K(1887488K), 5.4467875 secs] 5621956K->3944000K(21480704K), 
> 5.4480386 secs] [Times: user=21.86 sys=0.00, real=5.45 secs]
>
> 59716.348: [GC (Allocation Failure) 59716.348: [ParNew: 
> 1723357K->49256K(1887488K), 8.2982698 secs] 5621824K->3948162K(21480704K), 
> 8.2994285 secs] [Times: user=33.47 sys=0.00, real=8.30 secs]
>
> 59751.501: [GC (Allocation Failure) 59751.502: [ParNew: 
> 1727080K->47156K(1887488K), 9.4546047 secs] 5625986K->3946152K(21480704K), 
> 9.4558187 secs] [Times: user=37.88 sys=0.00, real=9.45 secs]
>
> 59779.293: [GC (Allocation Failure) 59779.293: [ParNew: 
> 1724980K->31847K(1887488K), 98.3531267 secs] 5623976K->3949810K(21480704K), 
> 98.3542271 secs] [Times: user=393.93 sys=0.00, real=98.35 secs]
>
> 59933.098: [GC (Allocation Failure) 59933.099: [ParNew: 
> 1709671K->24851K(1887488K), 155.8798281 secs] 
> 5627634K->3947185K(21480704K), 155.8810828 secs] [Times: user=629.60 
> sys=0.00, real=155.88 secs]
>
> 60138.148: [GC (Allocation Failure) 60138.148: [ParNew: 
> 1702675K->22690K(1887488K), 162.4118957 secs] 
> 5625009K->3945214K(21480704K), 162.4132521 secs] [Times: user=659.75 
> sys=0.00, real=162.42 secs]
>
> 60352.343: [GC (Allocation Failure) 60352.343: [ParNew: 
> 1700514K->26504K(1887488K), 166.2566004 secs] 
> 5623038K->3949247K(21480704K), 166.2577295 secs] [Times: user=672.08 
> sys=0.00, real=166.25 secs]
>
> 60576.273: [GC (Allocation Failure) 60576.274: [ParNew: 
> 1704328K->49624K(1887488K), 168.2311204 secs] 
> 5627071K->3972465K(21480704K), 168.2322492 secs] [Times: user=660.16 
> sys=0.00, real=168.23 secs]
>
> 60791.458: [GC (Allocation Failure) 60791.459: [ParNew: 
> 1727448K->33138K(1887488K), 174.0129541 secs] 
> 5650289K->3956199K(21480704K), 174.0139678 secs] [Times: user=680.44 
> sys=0.00, real=174.02 secs]
>
> 61022.223: [GC (Allocation Failure) 61022.223: [ParNew: 
> 1710962K->29340K(1887488K), 176.6459303 secs] 
> 5634023K->3952483K(21480704K), 176.6474561 secs] [Times: user=690.10 
> sys=0.00, real=176.64 secs]
>
> 61246.967: [GC (Allocation Failure) 61246.967: [ParNew: 
> 1707164K->31914K(1887488K), 182.4714040 secs] 
> 5630307K->3955486K(21480704K), 182.4724297 secs] [Times: user=721.68 
> sys=0.00, real=182.47 secs]
>
> 61483.620: [GC (Allocation Failure) 61483.621: [ParNew: 
> 1709738K->27879K(1887488K), 187.2219554 secs] 
> 5633310K->3951691K(21480704K), 187.2230896 secs] [Times: user=783.68 
> sys=0.00, real=187.23 secs]
>
> 61723.432: [GC (Allocation Failure) 61723.433: [ParNew: 
> 1705703K->22446K(1887488K), 192.4606406 secs] 
> 5629515K->3946498K(21480704K), 192.4618564 secs] [Times: user=721.75 
> sys=0.00, real=192.46 secs]
>
> 61965.560: [GC (Allocation Failure) 61965.561: [ParNew: 
> 1700270K->30881K(1887488K), 197.1221819 secs] 
> 5624322K->3955113K(21480704K), 197.1233286 secs] [Times: user=806.53 
> sys=0.00, real=197.13 secs]
>
> 62211.173: [GC (Allocation Failure) 62211.174: [ParNew: 
> 1708705K->22526K(1887488K), 199.7864929 secs] 
> 5632937K->3946917K(21480704K), 199.7875671 secs] [Times: user=811.19 
> sys=0.00, real=199.78 secs]
>
> 62464.936: [GC (Allocation Failure) 62464.937: [ParNew: 
> 1700350K->47417K(1887488K), 203.1958095 secs] 
> 5624741K->3972025K(21480704K), 203.1969591 secs] [Times: user=789.37 
> sys=0.00, real=203.20 secs]
>
> 62682.808: [GC (Allocation Failure) 62682.808: [ParNew: 
> 1724605K->37951K(1887488K), 206.8942846 secs] 
> 5649212K->3962723K(21480704K), 206.8948168 secs] [Times: user=821.45 
> sys=0.00, real=206.90 secs]
>
>
>
> On Fri, Dec 22, 2017 at 3:12 AM, Kirk Pepperdine <[email protected] 
> <javascript:>> wrote:
>
>> Hi,
>>
>> If GC logging is giving you grief, you’ve got bigger problems because GC 
>> logs a very small amount of data. If this is an issue there will be a hint 
>> of it in the GC logs. At any rate, the problem is with GC and turning off 
>> GC logging leaves you blind to any possible solutions.
>>
>> Kind regards,
>> Kirk Pepperdine
>>
>> On Dec 21, 2017, at 9:40 PM, Josh Humphries <[email protected] 
>> <javascript:>> wrote:
>>
>> Like Erik said, check that the host isn't swapping. An old-gen GC can end 
>> up scanning parts of the heap that the OS may have paged out if it 
>> encountered memory pressure.
>>
>> Another issue I've seen that causes this sort of thing is if you have GC 
>> logging turned on and encounter issues in I/O. GC logging is not async. 
>> I've seen the GC logging itself, to a drive that had a SCSI firmware issue, 
>> cause silly-long GC times. You can possibly corroborate by checking kernel 
>> logs to see if there are other signals of disk/driver issues.
>>
>>
>> ----
>> *Josh Humphries*
>> [email protected] <javascript:>
>>
>> On Thu, Dec 21, 2017 at 3:36 PM, Erik Svensson <[email protected] 
>> <javascript:>> wrote:
>>
>>> If it’s on a virtual host then I’ve seen the same behaviour when the 
>>> underlying hw started swapping. I’ve also seen the same when the other 
>>> hosts on the hw had high cpu demands (i.e. my host was cpu starved).
>>> But, as martin said, gc logging always.
>>>
>>> cheer
>>>
>>> Erik Svensson
>>> D
>>>
>>> On 21 Dec 2017, at 18:25, Abhishek Agarwal <[email protected] 
>>> <javascript:>> wrote:
>>>
>>> Hello everyone,
>>> Recently, I had one of the JVMs going into this weird state wherein 
>>> minor gc start kicking almost every minute but what is most surprising that 
>>> it is one minor GC is taking as much as 100 seconds. The heap size is 
>>> around 12 gigs and young gen is about 30% of that. As soon as gc starts, 
>>> understandably CPU usage started increasing as well. I am not able to 
>>> figure out how would one minor GC take this much time. The machine does not 
>>> have swapping enabled. I also ran sar and confirmed there was no swapping 
>>> during the issue. We are using parallel GC for young gen and CMS for old 
>>> gen GC. How can I go about troubleshooting this further? Any pointers will 
>>> be appreciated. 
>>>
>>> -- 
>>> You received this message because you are subscribed to the Google 
>>> Groups "mechanical-sympathy" group.
>>> To unsubscribe from this group and stop receiving emails from it, send 
>>> an email to [email protected] 
>>> <javascript:>.
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>>
>>>
>>> -- 
>>> You received this message because you are subscribed to the Google 
>>> Groups "mechanical-sympathy" group.
>>> To unsubscribe from this group and stop receiving emails from it, send 
>>> an email to [email protected] 
>>> <javascript:>.
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "mechanical-sympathy" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected] <javascript:>.
>> For more options, visit https://groups.google.com/d/optout.
>>
>>
>> -- 
>> You received this message because you are subscribed to a topic in the 
>> Google Groups "mechanical-sympathy" group.
>> To unsubscribe from this topic, visit 
>> https://groups.google.com/d/topic/mechanical-sympathy/AG-iG9JuIfU/unsubscribe
>> .
>> To unsubscribe from this group and all its topics, send an email to 
>> [email protected] <javascript:>.
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>
>
> -- 
> Regards,
> Abhishek Agarwal
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Re: JVM with 100 seconds minor GC

Reply via email to