Hello, I'd like to solicit some help with a weird GC issue we are seeing.

I'm trying to debug OOM on a service we are running in k8s. The service is
just a CRUD server hitting a database (DynamoDB). Each replica serves about
300 qps of traffic. There are no memory leaks. On occasion (seemingly
correlated to small latency spikes on the backend), the service would OOM.
This is surprising because it has a circuit breaker that drops requests
after 200 concurrent connections that has never trips, and goroutine
profiles confirm that there are nowhere 200 active goroutines.

GC logs are pasted below. It's interlaced with dumps of runtime.Memstats
(the RSS number is coming from /proc/<pid>/stats). Go version is 1.12.5,
running an Alpine 3.10 container in an Amazon kernel
4.14.123-111.109.amzn2.x86_64.

The service happily serves requests using ~50MB of RSS for hours, until the
last 2 seconds, where GC mark&sweep time starts to 2-4X per cycle
(43+489/158/0.60+0.021
ms cpu => 43+489/158/0.60+0.021 ms cpu), and RSS and Sys blow up. It’s also
interesting that in the last log line: `Sys=995MB RSS=861MB HeapSys=199MB`.
If I’m reading this correctly, there’s at least `662MB` of memory in RSS
that is not assigned to the heap. Though this might be due to the change in
1.125 to use MADV_FREE, so the pages are freeable not yet reclaimed by the
kernel.

I don’t understand how heap can be so small across gc cycles (28->42->30MB
on the last line means heap doesn't grow past 42MB?), yet RSS keeps
growing. I'm assuming the increased RSS is causing the kernel to OOM the
service, but that should only happen if the RSS is not freeable as marked
by MADV_FREE. There doesn't seem to be any indication of that from the GC
logs. I guess this all comes down to me not having a good understanding of
how the GC algorithm works and how to read these logs. I'd really
appreciate it if anyone can explain what's happening and why.

gc 41833 @19135.227s 0%: 0.019+2.3+0.005 ms clock, 0.079+0.29/2.2/5.6+0.020
ms cpu, 11->11->5 MB, 12 MB goal, 4 P
INFO 2019-06-30T08:46:04.886 [Memory]: Alloc=7MB TotalAlloc=230172MB
Sys=69MB RSS=51MB HeapSys=62MB HeapIdle=51MB HeapInUse=11MB HeapReleased=5MB
gc 41834 @19135.869s 0%: 0.005+2.9+0.003 ms clock, 0.023+0.32/2.5/6.6+0.012
ms cpu, 11->11->5 MB, 12 MB goal, 4 P
INFO 2019-06-30T08:46:05.886 [Memory]: Alloc=9MB TotalAlloc=230179MB
Sys=69MB RSS=51MB HeapSys=62MB HeapIdle=50MB HeapInUse=12MB HeapReleased=5MB
gc 41835 @19136.704s 0%: 0.038+2.1+0.004 ms clock, 0.15+0.35/2.1/5.3+0.016
ms cpu, 11->11->5 MB, 12 MB goal, 4 P
INFO 2019-06-30T08:46:06.886 [Memory]: Alloc=9MB TotalAlloc=230184MB
Sys=69MB RSS=51MB HeapSys=62MB HeapIdle=50MB HeapInUse=12MB HeapReleased=5MB
gc 41836 @19137.611s 0%: 0.009+2.1+0.003 ms clock, 0.036+0.39/2.0/5.7+0.015
ms cpu, 11->11->5 MB, 12 MB goal, 4 P
INFO 2019-06-30T08:46:07.887 [Memory]: Alloc=10MB TotalAlloc=230190MB
Sys=69MB RSS=51MB HeapSys=62MB HeapIdle=49MB HeapInUse=12MB HeapReleased=5MB
gc 41837 @19138.444s 0%: 0.008+2.1+0.004 ms clock, 0.035+0.51/2.1/5.7+0.017
ms cpu, 11->11->5 MB, 12 MB goal, 4 P
INFO 2019-06-30T08:46:08.887 [Memory]: Alloc=10MB TotalAlloc=230195MB
Sys=69MB RSS=51MB HeapSys=62MB HeapIdle=49MB HeapInUse=12MB HeapReleased=5MB
gc 41838 @19139.474s 0%: 0.005+2.6+0.003 ms clock, 0.023+0.37/2.5/4.3+0.014
ms cpu, 11->11->5 MB, 12 MB goal, 4 P
gc 41839 @19140.173s 0%: 0.011+2.4+0.003 ms clock, 0.046+0.20/2.3/5.8+0.015
ms cpu, 11->11->5 MB, 12 MB goal, 4 P
INFO 2019-06-30T08:46:09.887 [Memory]: Alloc=7MB TotalAlloc=230202MB
Sys=69MB RSS=51MB HeapSys=62MB HeapIdle=50MB HeapInUse=11MB HeapReleased=5MB
gc 41840 @19140.831s 0%: 0.082+2.1+0.003 ms clock, 0.32+0.64/2.1/5.3+0.014
ms cpu, 11->11->5 MB, 12 MB goal, 4 P
INFO 2019-06-30T08:46:10.887 [Memory]: Alloc=9MB TotalAlloc=230209MB
Sys=69MB RSS=51MB HeapSys=62MB HeapIdle=50MB HeapInUse=12MB HeapReleased=5MB
gc 41841 @19141.655s 0%: 0.014+2.1+0.003 ms clock, 0.056+0.28/2.0/5.7+0.013
ms cpu, 11->11->5 MB, 12 MB goal, 4 P
gc 41842 @19142.316s 0%: 0.006+2.7+0.003 ms clock, 0.027+0.29/2.6/6.2+0.014
ms cpu, 11->11->5 MB, 12 MB goal, 4 P
INFO 2019-06-30T08:46:11.888 [Memory]: Alloc=6MB TotalAlloc=230216MB
Sys=69MB RSS=51MB HeapSys=62MB HeapIdle=51MB HeapInUse=11MB HeapReleased=5MB
gc 41843 @19142.942s 0%: 0.010+2.1+0.005 ms clock, 0.040+0.29/2.0/5.7+0.023
ms cpu, 11->11->5 MB, 12 MB goal, 4 P
INFO 2019-06-30T08:46:12.888 [Memory]: Alloc=9MB TotalAlloc=230223MB
Sys=69MB RSS=51MB HeapSys=62MB HeapIdle=50MB HeapInUse=11MB HeapReleased=5MB
gc 41844 @19143.724s 0%: 0.008+2.4+0.004 ms clock, 0.035+0.38/2.0/5.7+0.017
ms cpu, 11->11->5 MB, 12 MB goal, 4 P
gc 41845 @19144.380s 0%: 10+9.3+0.044 ms clock, 43+6.1/9.2/4.4+0.17 ms cpu,
11->11->6 MB, 12 MB goal, 4 P
INFO 2019-06-30T08:46:13.901 [Memory]: Alloc=6MB TotalAlloc=230230MB
Sys=136MB RSS=98MB HeapSys=94MB HeapIdle=83MB HeapInUse=11MB
HeapReleased=35MB
gc 41846 @19144.447s 0%: 0.008+26+0.005 ms clock, 0.033+0.46/7.8/26+0.020
ms cpu, 11->12->9 MB, 12 MB goal, 4 P
gc 41847 @19144.672s 0%: 0.013+76+0.006 ms clock, 0.053+0.20/6.4/80+0.024
ms cpu, 17->18->8 MB, 18 MB goal, 4 P
gc 41848 @19145.014s 0%: 0.008+172+0.005 ms clock, 0.035+0.13/8.5/177+0.022
ms cpu, 15->17->10 MB, 16 MB goal, 4 P
gc 41849 @19145.298s 0%: 0.007+285+0.006 ms clock, 0.030+10/285/7.6+0.024
ms cpu, 19->23->15 MB, 20 MB goal, 4 P
INFO 2019-06-30T08:46:15.052 [Memory]: Alloc=22MB TotalAlloc=230264MB
Sys=598MB RSS=531MB HeapSys=265MB HeapIdle=240MB HeapInUse=25MB
HeapReleased=164MB
gc 41850 @19145.665s 0%: 10+419+0.005 ms clock, 43+489/158/0.60+0.021 ms
cpu, 26->30->17 MB, 30 MB goal, 4 P
gc 41851 @19146.325s 0%: 21+798+0.036 ms clock, 86+990/401/0+0.14 ms cpu,
28->42->30 MB, 34 MB goal, 4 P
INFO 2019-06-30T08:46:16.613 [Memory]: Alloc=41MB TotalAlloc=230303MB
Sys=995MB RSS=861MB HeapSys=199MB HeapIdle=155MB HeapInUse=44MB
HeapReleased=54MB

I also captured the OOM log from dmesg here
https://gist.github.com/mightyguava/7ecc6fc55f5cd925062d6beede3783b3.

-- 
Yunchi

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/golang-nuts/CANnT9sj1_sZCKDkGbkzarwcn8DYEX9OS6Ack%2B71613eyLQ7y6w%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to