[jira] [Commented] (KAFKA-8367) Non-heap memory leak in Kafka Streams

Sophie Blee-Goldman (JIRA) Thu, 16 May 2019 14:47:59 -0700


    [ 
https://issues.apache.org/jira/browse/KAFKA-8367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16841748#comment-16841748
 ]


Sophie Blee-Goldman commented on KAFKA-8367:
--------------------------------------------

Hm. I notice in going from 2.0 to 2.1 we upgraded Rocks from 5.7.3 to 5.15.10 
and would like to rule out the possibility that the leak is in Rocks itself. If 
we can test the older version of Rocks with the newer version of Streams that 
should help us isolate the problem. I opened a quick branch off 2.2 with Rocks 
downgraded to v5.7 – can you build from 
[this|[https://github.com/ableegoldman/kafka/tree/testRocksDBleak]] and see if 
the leak is still present? 

Did your RocksDBConfigSetter before 2.2.0 use/set the same configs or did any 
of those change? I agree your ConfigSetter shouldn't be leaking just trying to 
get all the details. It might also be worth investigating whether the leak is 
present  since 2.1 or just 2.2

> Non-heap memory leak in Kafka Streams
> -------------------------------------
>
>                 Key: KAFKA-8367
>                 URL: https://issues.apache.org/jira/browse/KAFKA-8367
>             Project: Kafka
>          Issue Type: Bug
>          Components: streams
>    Affects Versions: 2.2.0
>            Reporter: Pavel Savov
>            Priority: Major
>         Attachments: memory-prod.png, memory-test.png
>
>
> We have been observing a non-heap memory leak after upgrading to Kafka 
> Streams 2.2.0 from 2.0.1. We suspect the source to be around RocksDB as the 
> leak only happens when we enable stateful stream operations (utilizing 
> stores). We are aware of *KAFKA-8323* and have created our own fork of 2.2.0 
> and ported the fix scheduled for release in 2.2.1 to our fork. It did not 
> stop the leak, however.
> We are having this memory leak in our production environment where the 
> consumer group is auto-scaled in and out in response to changes in traffic 
> volume, and in our test environment where we have two consumers, no 
> autoscaling and relatively constant traffic.
> Below is some information I'm hoping will be of help:
>  * RocksDB Config:
> Block cache size: 4 MiB
> Write buffer size: 2 MiB
> Block size: 16 KiB
> Cache index and filter blocks: true
> Manifest preallocation size: 64 KiB
> Max write buffer number: 3
> Max open files: 6144
>  
>  * Memory usage in production
> The attached graph (memory-prod.png) shows memory consumption for each 
> instance as a separate line. The horizontal red line at 6 GiB is the memory 
> limit.
> As illustrated on the attached graph from production, memory consumption in 
> running instances goes up around autoscaling events (scaling the consumer 
> group either in or out) and associated rebalancing. It stabilizes until the 
> next autoscaling event but it never goes back down.
> An example of scaling out can be seen from around 21:00 hrs where three new 
> instances are started in response to a traffic spike.
> Just after midnight traffic drops and some instances are shut down. Memory 
> consumption in the remaining running instances goes up.
> Memory consumption climbs again from around 6:00AM due to increased traffic 
> and new instances are being started until around 10:30AM. Memory consumption 
> never drops until the cluster is restarted around 12:30.
>  
>  * Memory usage in test
> As illustrated by the attached graph (memory-test.png) we have a fixed number 
> of two instances in our test environment and no autoscaling. Memory 
> consumption rises linearly until it reaches the limit (around 2:00 AM on 
> 5/13) and Mesos restarts the offending instances, or we restart the cluster 
> manually.
>  
>  * No heap leaks observed
>  * Window retention: 2 or 11 minutes (depending on operation type)
>  * Issue not present in Kafka Streams 2.0.1
>  * No memory leak for stateless stream operations (when no RocksDB stores are 
> used)
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (KAFKA-8367) Non-heap memory leak in Kafka Streams

Reply via email to