Re: JVM with 100 seconds minor GC

Kirk Pepperdine Fri, 22 Dec 2017 11:51:05 -0800

Hi,

This log doesn’t really show anything interesting. I can’t see any real reason 
why these collections should be taking so long. There is nothing unusual about 
them. There is no hint of obivous interference from virtualization but one has 
to believe that this is the issue in this particular case. Too bad AWS doesn’t 
give you proper performance counters that would let you prove this point.


Please do send another log. Please turn on Tenuring Details and Reference 
Processing times as well as ApplicationGCStoppedTime and 
ApplicationGCConcurrentTime. This will help eliminate other sources of pause. 
If they look clean then you can blame AWS.

Kind regards,
Kirk Pepperdine

> On Dec 22, 2017, at 7:28 AM, Abhishek Agarwal <[email protected]> wrote:
> 
> Thank you for the pointers. We have gc logs but didn't help much. 
> 
> Few corrections:
> Old gen size is ~ 20 Gigs
> Young gen size is ~ 1.5 Gigs
> 
> The skew between young gen and old gen size has been fixed. Here is the 
> snippet of gc logs when time increased to ~ 200 seconds. By the way the 
> service is running on AWS. 
> 
> 
> 59549.736: [GC (Allocation Failure) 59549.736: [ParNew: 
> 1688278K->56721K(1887488K), 0.5518315 secs] 4474842K->3433110K(21480704K), 
> 0.5526896 secs] [Times: user=1.92 sys=0.25, real=0.55 secs]
> 59581.163: [GC (Allocation Failure) 59581.164: [ParNew: 
> 1734545K->43688K(1887488K), 0.4640033 secs] 5110934K->3941007K(21480704K), 
> 0.4650333 secs] [Times: user=1.59 sys=0.26, real=0.46 secs]
> 59617.565: [GC (Allocation Failure) 59617.565: [ParNew: 
> 1721512K->40500K(1887488K), 0.1154788 secs] 5618831K->3937818K(21480704K), 
> 0.1166312 secs] [Times: user=0.44 sys=0.00, real=0.12 secs]
> 59645.910: [GC (Allocation Failure) 59645.911: [ParNew: 
> 1718324K->46789K(1887488K), 0.1918905 secs] 5615642K->3944132K(21480704K), 
> 0.1930935 secs] [Times: user=0.77 sys=0.00, real=0.20 secs]
> 59692.159: [GC (Allocation Failure) 59692.159: [ParNew: 
> 1724613K->45533K(1887488K), 5.4467875 secs] 5621956K->3944000K(21480704K), 
> 5.4480386 secs] [Times: user=21.86 sys=0.00, real=5.45 secs]
> 59716.348: [GC (Allocation Failure) 59716.348: [ParNew: 
> 1723357K->49256K(1887488K), 8.2982698 secs] 5621824K->3948162K(21480704K), 
> 8.2994285 secs] [Times: user=33.47 sys=0.00, real=8.30 secs]
> 59751.501: [GC (Allocation Failure) 59751.502: [ParNew: 
> 1727080K->47156K(1887488K), 9.4546047 secs] 5625986K->3946152K(21480704K), 
> 9.4558187 secs] [Times: user=37.88 sys=0.00, real=9.45 secs]
> 59779.293: [GC (Allocation Failure) 59779.293: [ParNew: 
> 1724980K->31847K(1887488K), 98.3531267 secs] 5623976K->3949810K(21480704K), 
> 98.3542271 secs] [Times: user=393.93 sys=0.00, real=98.35 secs]
> 59933.098: [GC (Allocation Failure) 59933.099: [ParNew: 
> 1709671K->24851K(1887488K), 155.8798281 secs] 5627634K->3947185K(21480704K), 
> 155.8810828 secs] [Times: user=629.60 sys=0.00, real=155.88 secs]
> 60138.148: [GC (Allocation Failure) 60138.148: [ParNew: 
> 1702675K->22690K(1887488K), 162.4118957 secs] 5625009K->3945214K(21480704K), 
> 162.4132521 secs] [Times: user=659.75 sys=0.00, real=162.42 secs]
> 60352.343: [GC (Allocation Failure) 60352.343: [ParNew: 
> 1700514K->26504K(1887488K), 166.2566004 secs] 5623038K->3949247K(21480704K), 
> 166.2577295 secs] [Times: user=672.08 sys=0.00, real=166.25 secs]
> 60576.273: [GC (Allocation Failure) 60576.274: [ParNew: 
> 1704328K->49624K(1887488K), 168.2311204 secs] 5627071K->3972465K(21480704K), 
> 168.2322492 secs] [Times: user=660.16 sys=0.00, real=168.23 secs]
> 60791.458: [GC (Allocation Failure) 60791.459: [ParNew: 
> 1727448K->33138K(1887488K), 174.0129541 secs] 5650289K->3956199K(21480704K), 
> 174.0139678 secs] [Times: user=680.44 sys=0.00, real=174.02 secs]
> 61022.223: [GC (Allocation Failure) 61022.223: [ParNew: 
> 1710962K->29340K(1887488K), 176.6459303 secs] 5634023K->3952483K(21480704K), 
> 176.6474561 secs] [Times: user=690.10 sys=0.00, real=176.64 secs]
> 61246.967: [GC (Allocation Failure) 61246.967: [ParNew: 
> 1707164K->31914K(1887488K), 182.4714040 secs] 5630307K->3955486K(21480704K), 
> 182.4724297 secs] [Times: user=721.68 sys=0.00, real=182.47 secs]
> 61483.620: [GC (Allocation Failure) 61483.621: [ParNew: 
> 1709738K->27879K(1887488K), 187.2219554 secs] 5633310K->3951691K(21480704K), 
> 187.2230896 secs] [Times: user=783.68 sys=0.00, real=187.23 secs]
> 61723.432: [GC (Allocation Failure) 61723.433: [ParNew: 
> 1705703K->22446K(1887488K), 192.4606406 secs] 5629515K->3946498K(21480704K), 
> 192.4618564 secs] [Times: user=721.75 sys=0.00, real=192.46 secs]
> 61965.560: [GC (Allocation Failure) 61965.561: [ParNew: 
> 1700270K->30881K(1887488K), 197.1221819 secs] 5624322K->3955113K(21480704K), 
> 197.1233286 secs] [Times: user=806.53 sys=0.00, real=197.13 secs]
> 62211.173: [GC (Allocation Failure) 62211.174: [ParNew: 
> 1708705K->22526K(1887488K), 199.7864929 secs] 5632937K->3946917K(21480704K), 
> 199.7875671 secs] [Times: user=811.19 sys=0.00, real=199.78 secs]
> 62464.936: [GC (Allocation Failure) 62464.937: [ParNew: 
> 1700350K->47417K(1887488K), 203.1958095 secs] 5624741K->3972025K(21480704K), 
> 203.1969591 secs] [Times: user=789.37 sys=0.00, real=203.20 secs]
> 62682.808: [GC (Allocation Failure) 62682.808: [ParNew: 
> 1724605K->37951K(1887488K), 206.8942846 secs] 5649212K->3962723K(21480704K), 
> 206.8948168 secs] [Times: user=821.45 sys=0.00, real=206.90 secs]
> 
> 
> 
> On Fri, Dec 22, 2017 at 3:12 AM, Kirk Pepperdine <[email protected] 
> <mailto:[email protected]>> wrote:
> Hi,
> 
> If GC logging is giving you grief, you’ve got bigger problems because GC logs 
> a very small amount of data. If this is an issue there will be a hint of it 
> in the GC logs. At any rate, the problem is with GC and turning off GC 
> logging leaves you blind to any possible solutions.
> 
> Kind regards,
> Kirk Pepperdine
> 
>> On Dec 21, 2017, at 9:40 PM, Josh Humphries <[email protected] 
>> <mailto:[email protected]>> wrote:
>> 
>> Like Erik said, check that the host isn't swapping. An old-gen GC can end up 
>> scanning parts of the heap that the OS may have paged out if it encountered 
>> memory pressure.
>> 
>> Another issue I've seen that causes this sort of thing is if you have GC 
>> logging turned on and encounter issues in I/O. GC logging is not async. I've 
>> seen the GC logging itself, to a drive that had a SCSI firmware issue, cause 
>> silly-long GC times. You can possibly corroborate by checking kernel logs to 
>> see if there are other signals of disk/driver issues.
>> 
>> 
>> ----
>> Josh Humphries
>> [email protected] <mailto:[email protected]>
>> 
>> On Thu, Dec 21, 2017 at 3:36 PM, Erik Svensson <[email protected] 
>> <mailto:[email protected]>> wrote:
>> If it’s on a virtual host then I’ve seen the same behaviour when the 
>> underlying hw started swapping. I’ve also seen the same when the other hosts 
>> on the hw had high cpu demands (i.e. my host was cpu starved).
>> But, as martin said, gc logging always.
>> 
>> cheer
>> 
>> Erik Svensson
>> D
>> 
>>> On 21 Dec 2017, at 18:25, Abhishek Agarwal <[email protected] 
>>> <mailto:[email protected]>> wrote:
>>> 
>>> Hello everyone,
>>> Recently, I had one of the JVMs going into this weird state wherein minor 
>>> gc start kicking almost every minute but what is most surprising that it is 
>>> one minor GC is taking as much as 100 seconds. The heap size is around 12 
>>> gigs and young gen is about 30% of that. As soon as gc starts, 
>>> understandably CPU usage started increasing as well. I am not able to 
>>> figure out how would one minor GC take this much time. The machine does not 
>>> have swapping enabled. I also ran sar and confirmed there was no swapping 
>>> during the issue. We are using parallel GC for young gen and CMS for old 
>>> gen GC. How can I go about troubleshooting this further? Any pointers will 
>>> be appreciated. 
>>> 
>>> -- 
>>> You received this message because you are subscribed to the Google Groups 
>>> "mechanical-sympathy" group.
>>> To unsubscribe from this group and stop receiving emails from it, send an 
>>> email to [email protected] 
>>> <mailto:[email protected]>.
>>> For more options, visit https://groups.google.com/d/optout 
>>> <https://groups.google.com/d/optout>.
>> 
>> 
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "mechanical-sympathy" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected] 
>> <mailto:[email protected]>.
>> For more options, visit https://groups.google.com/d/optout 
>> <https://groups.google.com/d/optout>.
>> 
>> 
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "mechanical-sympathy" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected] 
>> <mailto:[email protected]>.
>> For more options, visit https://groups.google.com/d/optout 
>> <https://groups.google.com/d/optout>.
> 
> 
> -- 
> You received this message because you are subscribed to a topic in the Google 
> Groups "mechanical-sympathy" group.
> To unsubscribe from this topic, visit 
> https://groups.google.com/d/topic/mechanical-sympathy/AG-iG9JuIfU/unsubscribe 
> <https://groups.google.com/d/topic/mechanical-sympathy/AG-iG9JuIfU/unsubscribe>.
> To unsubscribe from this group and all its topics, send an email to 
> [email protected] 
> <mailto:[email protected]>.
> For more options, visit https://groups.google.com/d/optout 
> <https://groups.google.com/d/optout>.
> 
> 
> 
> -- 
> Regards,
> Abhishek Agarwal
> 
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "mechanical-sympathy" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to [email protected] 
> <mailto:[email protected]>.
> For more options, visit https://groups.google.com/d/optout 
> <https://groups.google.com/d/optout>.

-- 
You received this message because you are subscribed to the Google Groups 
"mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Re: JVM with 100 seconds minor GC

Reply via email to