[ https://issues.apache.org/jira/browse/KAFKA-8367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16847768#comment-16847768 ]
Pavel Savov commented on KAFKA-8367: ------------------------------------ Below are dumps of our topologies: TOPOLOGY 1: {noformat} Topologies: Sub-topology: 0 Source: KSTREAM-SOURCE-0000000000 (topics: [pl.allegro.analytics.page_view_raw]) --> KSTREAM-MAP-0000000001 Processor: KSTREAM-MAP-0000000001 (stores: []) --> KSTREAM-TRANSFORM-0000000002 <-- KSTREAM-SOURCE-0000000000 Processor: KSTREAM-TRANSFORM-0000000002 (stores: [page_view_raw_deduplication_store]) --> KSTREAM-SINK-0000000003 <-- KSTREAM-MAP-0000000001 Sink: KSTREAM-SINK-0000000003 (topic: pl.allegro.analytics.page_view_raw_by_pv_id) <-- KSTREAM-TRANSFORM-0000000002{noformat} 1 state store (window size: 1 minute, window retention: 2 minutes), topic pl.allegro.analytics.page_view_raw: 64 partitions, retention: 72 hours; topic pl.allegro.analytics.page_view_raw_by_pv_id: 16 partitions, retention: 3 hours TOPOLOGY 2: {noformat} Topologies: Sub-topology: 0 Source: KSTREAM-SOURCE-0000000000 (topics: [pl.allegro.analytics.event_raw]) --> KSTREAM-MAP-0000000001 Processor: KSTREAM-MAP-0000000001 (stores: []) --> KSTREAM-TRANSFORM-0000000002 <-- KSTREAM-SOURCE-0000000000 Processor: KSTREAM-TRANSFORM-0000000002 (stores: [event_raw_deduplication_store]) --> KSTREAM-SINK-0000000003 <-- KSTREAM-MAP-0000000001 Sink: KSTREAM-SINK-0000000003 (topic: pl.allegro.analytics.event_raw_by_pv_id_local_pavel.savov) <-- KSTREAM-TRANSFORM-0000000002{noformat} 1 state store (window size: 1 minute, window retention: 2 minutes), topic pl.allegro.analytics.event_raw: 64 partitions, retention: 72 hours; topic pl.allegro.analytics.event_raw_by_pv_id: 16 partitions, retention: 3 hours TOPOLOGY 3: {noformat} Topologies: Sub-topology: 0 Source: KSTREAM-SOURCE-0000000000 (topics: [pl.allegro.analytics.event_raw_by_pv_id]) --> KSTREAM-FILTER-0000000002 Processor: KSTREAM-FILTER-0000000002 (stores: []) --> KSTREAM-TRANSFORM-0000000004 <-- KSTREAM-SOURCE-0000000000 Processor: KSTREAM-TRANSFORM-0000000004 (stores: [performance_windowed_store]) --> KSTREAM-MAPVALUES-0000000005 <-- KSTREAM-FILTER-0000000002 Processor: KSTREAM-MAPVALUES-0000000005 (stores: []) --> KSTREAM-FOREACH-0000000006 <-- KSTREAM-TRANSFORM-0000000004 Source: KSTREAM-SOURCE-0000000001 (topics: [pl.allegro.analytics.page_view_raw_by_pv_id]) --> KSTREAM-PROCESSOR-0000000003 Processor: KSTREAM-FOREACH-0000000006 (stores: []) --> none <-- KSTREAM-MAPVALUES-0000000005 Processor: KSTREAM-PROCESSOR-0000000003 (stores: [performance_windowed_store]) --> none <-- KSTREAM-SOURCE-0000000001{noformat} 1 state store (window size: 10 minutes, window retention: 11 minutes), topic pl.allegro.analytics.event_raw_by_pv_id: 16 partitions, retention: 3 hours; topic pl.allegro.analytics.page_view_raw_by_pv_id: 16 partitions, retention: 3 hours TOPOLOGY 4: {noformat} Topologies: Sub-topology: 0 Source: KSTREAM-SOURCE-0000000000 (topics: [pl.allegro.analytics.event_raw_by_pv_id]) --> KSTREAM-FILTER-0000000002 Processor: KSTREAM-FILTER-0000000002 (stores: []) --> KSTREAM-TRANSFORM-0000000004 <-- KSTREAM-SOURCE-0000000000 Processor: KSTREAM-TRANSFORM-0000000004 (stores: [opbox_boxes_windowed_store]) --> KSTREAM-MAPVALUES-0000000005 <-- KSTREAM-FILTER-0000000002 Processor: KSTREAM-MAPVALUES-0000000005 (stores: []) --> KSTREAM-FOREACH-0000000006 <-- KSTREAM-TRANSFORM-0000000004 Source: KSTREAM-SOURCE-0000000001 (topics: [pl.allegro.analytics.page_view_raw_by_pv_id]) --> KSTREAM-PROCESSOR-0000000003 Processor: KSTREAM-FOREACH-0000000006 (stores: []) --> none <-- KSTREAM-MAPVALUES-0000000005 Processor: KSTREAM-PROCESSOR-0000000003 (stores: [opbox_boxes_windowed_store]) --> none <-- KSTREAM-SOURCE-0000000001{noformat} 1 state store (window size: 10 minutes, window retention: 11 minutes), topic pl.allegro.analytics.event_raw_by_pv_id: 16 partitions, retention: 3 hours; topic pl.allegro.analytics.page_view_raw_by_pv_id: 16 partitions, retention: 3 hours Please let me know if there is anything else that could help you with the investigation. Thanks! > Non-heap memory leak in Kafka Streams > ------------------------------------- > > Key: KAFKA-8367 > URL: https://issues.apache.org/jira/browse/KAFKA-8367 > Project: Kafka > Issue Type: Bug > Components: streams > Affects Versions: 2.2.0 > Reporter: Pavel Savov > Priority: Major > Attachments: memory-prod.png, memory-test.png > > > We have been observing a non-heap memory leak after upgrading to Kafka > Streams 2.2.0 from 2.0.1. We suspect the source to be around RocksDB as the > leak only happens when we enable stateful stream operations (utilizing > stores). We are aware of *KAFKA-8323* and have created our own fork of 2.2.0 > and ported the fix scheduled for release in 2.2.1 to our fork. It did not > stop the leak, however. > We are having this memory leak in our production environment where the > consumer group is auto-scaled in and out in response to changes in traffic > volume, and in our test environment where we have two consumers, no > autoscaling and relatively constant traffic. > Below is some information I'm hoping will be of help: > * RocksDB Config: > Block cache size: 4 MiB > Write buffer size: 2 MiB > Block size: 16 KiB > Cache index and filter blocks: true > Manifest preallocation size: 64 KiB > Max write buffer number: 3 > Max open files: 6144 > > * Memory usage in production > The attached graph (memory-prod.png) shows memory consumption for each > instance as a separate line. The horizontal red line at 6 GiB is the memory > limit. > As illustrated on the attached graph from production, memory consumption in > running instances goes up around autoscaling events (scaling the consumer > group either in or out) and associated rebalancing. It stabilizes until the > next autoscaling event but it never goes back down. > An example of scaling out can be seen from around 21:00 hrs where three new > instances are started in response to a traffic spike. > Just after midnight traffic drops and some instances are shut down. Memory > consumption in the remaining running instances goes up. > Memory consumption climbs again from around 6:00AM due to increased traffic > and new instances are being started until around 10:30AM. Memory consumption > never drops until the cluster is restarted around 12:30. > > * Memory usage in test > As illustrated by the attached graph (memory-test.png) we have a fixed number > of two instances in our test environment and no autoscaling. Memory > consumption rises linearly until it reaches the limit (around 2:00 AM on > 5/13) and Mesos restarts the offending instances, or we restart the cluster > manually. > > * No heap leaks observed > * Window retention: 2 or 11 minutes (depending on operation type) > * Issue not present in Kafka Streams 2.0.1 > * No memory leak for stateless stream operations (when no RocksDB stores are > used) > -- This message was sent by Atlassian JIRA (v7.6.3#76005)