Re: [go-nuts] Re: Pod memory keeps on increasing and restart with error OOMKilled

Robert Engels Fri, 11 Mar 2022 07:29:32 -0800

I think your best course of action is to go to the Kafka forums. 

> On Mar 10, 2022, at 10:12 PM, Rakesh K R <rakeshkr1...@gmail.com> wrote:
> 
> Hi,
> Thank you. I know its kafka related question but thread started with issue in 
> golang but later we are suspecting issue in go kafka library configuration.
> FYI, I am interested in kafka client or producer properties. 
> log.retention.hours or log.retention.bytes are kafka broker related 
> configuration. so I am confused now
> 
>> On Friday, March 11, 2022 at 1:34:23 AM UTC+5:30 ren...@ix.netcom.com wrote:
>> Look at log.retention.hours and log.retention.bytes
>> 
>> You should post this in the Kafka forums not the Go ones. 
>> 
>>>> On Mar 10, 2022, at 11:04 AM, Rakesh K R <rakesh...@gmail.com> wrote:
>>>> 
>>> Hi,
>> 
>>> Sorry I am not sure which kafka configuration are you referring here. Can 
>>> you please point me to the right configuration responsible for retaining 
>>> the message for replay.
>>> I see following properties which might be related but not sure:
>>> queued.min.messages
>>> queued.max.messages.kbytes
>>> queue.buffering.max.messages
>>> queue.buffering.max.kbytes
>>> linger.ms ---> this is currently set to 1000
>>> message.timeout.ms
>>> 
>>> Thank you
>>> 
>>>> On Thursday, March 10, 2022 at 9:50:50 PM UTC+5:30 ren...@ix.netcom.com 
>>>> wrote:
>>>> You need to configure Kafka for how long it retains messages for replay - 
>>>> or some other option to store on disk. 
>>>> 
>>>>>> On Mar 10, 2022, at 10:07 AM, Rakesh K R <rakesh...@gmail.com> wrote:
>>>>>> 
>>>>> Tamas,
>>>> 
>>>>> Thanks you. So any suggestion on how to make application release this 
>>>>> 900MiB memory back to OS so that pod will not end up in OOMKilled state?
>>>>> 
>>>>>>> On Thursday, March 10, 2022 at 1:45:18 PM UTC+5:30 Tamás Gulácsi wrote:
>>>>>>> gopkg.in/confluentinc/confluent-kafka-go.v1/kafka._Cfunc_GoBytes
>>>>>>> 
>>>>>>> says it uses cgo, hiding it's memory usage from Go. I bet that 900MiB 
>>>>>>> of memory is there...
>>>>>>> 
>>>>>>> 
>>>>>>> Rakesh K R a következőt írta (2022. március 10., csütörtök, 7:26:57 
>>>>>>> UTC+1):
>>>>>>>> HI,
>>>>>>>> I have a micro service application deployed on kubernetes cluster(with 
>>>>>>>> 1gb of pod memory limit). This app receive continuous messages(500 
>>>>>>>> message per second) from producer app over kafka interface(these 
>>>>>>>> messages are encoded in protobuf format.)
>>>>>>>> 
>>>>>>>> Basic application flow:
>>>>>>>> 1. Get the message one by one from kafka
>>>>>>>> 2. unmarshal proto message
>>>>>>>> 3. apply business logic
>>>>>>>> 4. write the message to redis cache(in []byte format)
>>>>>>>> 
>>>>>>>> When pod starts memory will be around 50mb and memory starts 
>>>>>>>> increasing as traffic flows into the application. It is never released 
>>>>>>>> back to OS. As a result pod restarts with error code OOMKilled.
>>>>>>>> I have integrated grafana to see memory usage like RSS, heap, stack.
>>>>>>>> During this traffic flow, in-use heap size is 80mb, idle heap is 80mb 
>>>>>>>> where as process resident memory is at 800-1000MB. Stopping the 
>>>>>>>> traffic completely for hours did not help and RSS continue to remain 
>>>>>>>> in 1000mb.
>>>>>>>> Tried to analyze this with pprof and it reports only 80mb are in 
>>>>>>>> in-use section. So I am wondering where these remaining 800-1000mb of 
>>>>>>>> pods memory went. Also application allocates memory like 
>>>>>>>> slices/maps/strings to perform business logic(see alloc_space pprof 
>>>>>>>> output below)
>>>>>>>> 
>>>>>>>> I tried couple of experiments:
>>>>>>>> 1. Calling FreeOsMemory() in the app but that did not help
>>>>>>>> 2. invoking my app with GODEBUG=madvdontneed=1 my_app_executable and 
>>>>>>>> did not help
>>>>>>>> 3. Leaving the application for 5-6hrs without any traffic to see 
>>>>>>>> whether memory comes down. It  did not help
>>>>>>>> 4. pprof shows only 80mb of heap in use
>>>>>>>> 5. Tried upgrading golang version from 1.13 to 1.16 as there were some 
>>>>>>>> improvements in runtime. It did not help
>>>>>>>> 
>>>>>>>> pprof output for alloc_space:
>>>>>>>> 
>>>>>>>> (pprof) top20
>>>>>>>> Showing nodes accounting for 481.98GB, 91.57% of 526.37GB total
>>>>>>>> Dropped 566 nodes (cum <= 2.63GB)
>>>>>>>> Showing top 20 nodes out of 114
>>>>>>>>       flat  flat%   sum%        cum   cum%
>>>>>>>>    78.89GB 14.99% 14.99%    78.89GB 14.99%  
>>>>>>>> github.com/go-redis/redis/v7/internal/proto.(*Reader).readStringReply
>>>>>>>>    67.01GB 12.73% 27.72%   285.33GB 54.21%  
>>>>>>>> airgroup/internal/wrapper/agrediswrapper.GetAllConfigurationForGroups
>>>>>>>>    58.75GB 11.16% 38.88%    58.75GB 11.16%  
>>>>>>>> google.golang.org/protobuf/internal/impl.(*MessageInfo).MessageOf
>>>>>>>>    52.26GB  9.93% 48.81%    52.26GB  9.93%  reflect.unsafe_NewArray
>>>>>>>>    45.78GB  8.70% 57.50%    46.38GB  8.81%  
>>>>>>>> encoding/json.(*decodeState).literalStore
>>>>>>>>    36.98GB  7.02% 64.53%    36.98GB  7.02%  reflect.New
>>>>>>>>    28.20GB  5.36% 69.89%    28.20GB  5.36%  
>>>>>>>> gopkg.in/confluentinc/confluent-kafka-go.v1/kafka._Cfunc_GoBytes
>>>>>>>>    25.60GB  4.86% 74.75%    63.62GB 12.09%  
>>>>>>>> google.golang.org/protobuf/proto.MarshalOptions.marshal
>>>>>>>>    12.79GB  2.43% 77.18%   165.56GB 31.45%  
>>>>>>>> encoding/json.(*decodeState).object
>>>>>>>>    12.73GB  2.42% 79.60%    12.73GB  2.42%  reflect.mapassign
>>>>>>>>    11.05GB  2.10% 81.70%    63.31GB 12.03%  reflect.MakeSlice
>>>>>>>>    10.06GB  1.91% 83.61%    12.36GB  2.35%  
>>>>>>>> filterServersForDestinationDevicesAndSendToDistributionChan
>>>>>>>>     6.92GB  1.32% 84.92%   309.45GB 58.79%  
>>>>>>>> groupAndSendToConfigPolicyChannel
>>>>>>>>     6.79GB  1.29% 86.21%    48.85GB  9.28%  
>>>>>>>> publishInternalMsgToDistributionService
>>>>>>>>     6.79GB  1.29% 87.50%   174.81GB 33.21%  encoding/json.Unmarshal
>>>>>>>>     6.14GB  1.17% 88.67%     6.14GB  1.17%  
>>>>>>>> google.golang.org/protobuf/internal/impl.consumeBytes
>>>>>>>>     4.64GB  0.88% 89.55%    14.39GB  2.73%  
>>>>>>>> GetAllDevDataFromGlobalDevDataDb
>>>>>>>>     4.11GB  0.78% 90.33%    18.47GB  3.51%  
>>>>>>>> GetAllServersFromServerRecordDb
>>>>>>>>     3.27GB  0.62% 90.95%     3.27GB  0.62%  net.HardwareAddr.String
>>>>>>>>     3.23GB  0.61% 91.57%     3.23GB  0.61%  reflect.makemap
>>>>>>>> (pprof)
>>>>>>>> 
>>>>>>>> 
>>>>>>>> Need experts help in analyzing this issue.
>>>>>>>> 
>>>>>>>> Thanks in advance!!
>>>>>> 
>>>>> -- 
>>>>> You received this message because you are subscribed to the Google Groups 
>>>>> "golang-nuts" group.
>>>>> To unsubscribe from this group and stop receiving emails from it, send an 
>>>>> email to golang-nuts...@googlegroups.com.
>>>>> To view this discussion on the web visit 
>>>>> https://groups.google.com/d/msgid/golang-nuts/e9b91937-7bf5-4526-940f-2a60b2989ddfn%40googlegroups.com.
>>> 
>>> -- 
>>> You received this message because you are subscribed to the Google Groups 
>>> "golang-nuts" group.
>>> To unsubscribe from this group and stop receiving emails from it, send an 
>>> email to golang-nuts...@googlegroups.com.
>> 
>>> To view this discussion on the web visit 
>>> https://groups.google.com/d/msgid/golang-nuts/0b7a3c46-00f6-43ff-9db4-b0520282610an%40googlegroups.com.
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "golang-nuts" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to golang-nuts+unsubscr...@googlegroups.com.
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/golang-nuts/70cbff4e-464f-4d4a-96da-e9c82ca89978n%40googlegroups.com.


-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/golang-nuts/444CD915-60E7-4FEE-99F6-F167723D387C%40ix.netcom.com.

Re: [go-nuts] Re: Pod memory keeps on increasing and restart with error OOMKilled

Reply via email to