[go-nuts] Pod memory keeps on increasing and restart with error OOMKilled

Rakesh K R Wed, 09 Mar 2022 22:27:29 -0800

HI,
I have a micro service application deployed on kubernetes cluster(with 1gb 
of pod memory limit). This app receive continuous messages(500 message per 
second) from producer app over kafka interface(these messages are encoded 
in protobuf format.)


*Basic application flow:*
1. Get the message one by one from kafka
2. unmarshal proto message
3. apply business logic
4. write the message to redis cache(in []byte format)

When pod starts memory will be around 50mb and memory starts increasing as 
traffic flows into the application. It is never released back to OS. As a 
result pod restarts with error code *OOMKilled*.
I have integrated grafana to see memory usage like RSS, heap, stack.
During this traffic flow, in-use heap size is 80mb, idle heap is 80mb where 
as process resident memory is at 800-1000MB. Stopping the traffic 
completely for hours did not help and RSS continue to remain in 1000mb.
Tried to analyze this with pprof and it reports only 80mb are in in-use 
section. So I am wondering where these remaining 800-1000mb of pods memory 
went. Also application allocates memory like slices/maps/strings to perform 
business logic(see alloc_space pprof output below)

I tried couple of experiments:
1. Calling FreeOsMemory() in the app but that did not help
2. invoking my app with GODEBUG=madvdontneed=1 my_app_executable and did 
not help
3. Leaving the application for 5-6hrs without any traffic to see whether 
memory comes down. It did not help
4. pprof shows only 80mb of heap in use
5. Tried upgrading golang version from 1.13 to 1.16 as there were some 
improvements in runtime. It did not help

pprof output for *alloc_space*:

(pprof) top20
Showing nodes accounting for 481.98GB, 91.57% of 526.37GB total
Dropped 566 nodes (cum <= 2.63GB)
Showing top 20 nodes out of 114
      flat  flat%   sum%        cum   cum%
   78.89GB 14.99% 14.99%    78.89GB 14.99% 
 github.com/go-redis/redis/v7/internal/proto.(*Reader).readStringReply
   67.01GB 12.73% 27.72%   285.33GB 54.21% 
 airgroup/internal/wrapper/agrediswrapper.GetAllConfigurationForGroups
   58.75GB 11.16% 38.88%    58.75GB 11.16% 
 google.golang.org/protobuf/internal/impl.(*MessageInfo).MessageOf
   52.26GB  9.93% 48.81%    52.26GB  9.93%  reflect.unsafe_NewArray
   45.78GB  8.70% 57.50%    46.38GB  8.81% 
 encoding/json.(*decodeState).literalStore
   36.98GB  7.02% 64.53%    36.98GB  7.02%  reflect.New
   28.20GB  5.36% 69.89%    28.20GB  5.36% 
 gopkg.in/confluentinc/confluent-kafka-go.v1/kafka._Cfunc_GoBytes
   25.60GB  4.86% 74.75%    63.62GB 12.09% 
 google.golang.org/protobuf/proto.MarshalOptions.marshal
   12.79GB  2.43% 77.18%   165.56GB 31.45% 
 encoding/json.(*decodeState).object
   12.73GB  2.42% 79.60%    12.73GB  2.42%  reflect.mapassign
   11.05GB  2.10% 81.70%    63.31GB 12.03%  reflect.MakeSlice
   10.06GB  1.91% 83.61%    12.36GB  2.35% 
 filterServersForDestinationDevicesAndSendToDistributionChan
    6.92GB  1.32% 84.92%   309.45GB 58.79% 
 groupAndSendToConfigPolicyChannel
    6.79GB  1.29% 86.21%    48.85GB  9.28% 
 publishInternalMsgToDistributionService
    6.79GB  1.29% 87.50%   174.81GB 33.21%  encoding/json.Unmarshal
    6.14GB  1.17% 88.67%     6.14GB  1.17% 
 google.golang.org/protobuf/internal/impl.consumeBytes
    4.64GB  0.88% 89.55%    14.39GB  2.73%  GetAllDevDataFromGlobalDevDataDb
    4.11GB  0.78% 90.33%    18.47GB  3.51%  GetAllServersFromServerRecordDb
    3.27GB  0.62% 90.95%     3.27GB  0.62%  net.HardwareAddr.String
    3.23GB  0.61% 91.57%     3.23GB  0.61%  reflect.makemap
(pprof)


Need experts help in analyzing this issue.

Thanks in advance!!

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/golang-nuts/372b8f8f-d276-44a4-a6e2-e2b7ad393eb9n%40googlegroups.com.

[go-nuts] Pod memory keeps on increasing and restart with error OOMKilled

Reply via email to