[ 
https://issues.apache.org/jira/browse/KAFKA-8367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16847768#comment-16847768
 ] 

Pavel Savov commented on KAFKA-8367:
------------------------------------

Below are dumps of our topologies:

 
 TOPOLOGY 1:
{noformat}
Topologies:
   Sub-topology: 0
    Source: KSTREAM-SOURCE-0000000000 (topics: 
[pl.allegro.analytics.page_view_raw])
      --> KSTREAM-MAP-0000000001
    Processor: KSTREAM-MAP-0000000001 (stores: [])
      --> KSTREAM-TRANSFORM-0000000002
      <-- KSTREAM-SOURCE-0000000000
    Processor: KSTREAM-TRANSFORM-0000000002 (stores: 
[page_view_raw_deduplication_store])
      --> KSTREAM-SINK-0000000003
      <-- KSTREAM-MAP-0000000001
    Sink: KSTREAM-SINK-0000000003 (topic: 
pl.allegro.analytics.page_view_raw_by_pv_id)
      <-- KSTREAM-TRANSFORM-0000000002{noformat}
1 state store (window size: 1 minute, window retention: 2 minutes), topic 
pl.allegro.analytics.page_view_raw: 64 partitions, retention: 72 hours; topic 
pl.allegro.analytics.page_view_raw_by_pv_id: 16 partitions, retention: 3 hours

 

TOPOLOGY 2:
{noformat}
Topologies:
   Sub-topology: 0
    Source: KSTREAM-SOURCE-0000000000 (topics: [pl.allegro.analytics.event_raw])
      --> KSTREAM-MAP-0000000001
    Processor: KSTREAM-MAP-0000000001 (stores: [])
      --> KSTREAM-TRANSFORM-0000000002
      <-- KSTREAM-SOURCE-0000000000
    Processor: KSTREAM-TRANSFORM-0000000002 (stores: 
[event_raw_deduplication_store])
      --> KSTREAM-SINK-0000000003
      <-- KSTREAM-MAP-0000000001
    Sink: KSTREAM-SINK-0000000003 (topic: 
pl.allegro.analytics.event_raw_by_pv_id_local_pavel.savov)
      <-- KSTREAM-TRANSFORM-0000000002{noformat}
1 state store (window size: 1 minute, window retention: 2 minutes), topic 
pl.allegro.analytics.event_raw: 64 partitions, retention: 72 hours; topic 
pl.allegro.analytics.event_raw_by_pv_id: 16 partitions, retention: 3 hours

 

TOPOLOGY 3:
{noformat}
Topologies:
   Sub-topology: 0
    Source: KSTREAM-SOURCE-0000000000 (topics: 
[pl.allegro.analytics.event_raw_by_pv_id])
      --> KSTREAM-FILTER-0000000002
    Processor: KSTREAM-FILTER-0000000002 (stores: [])
      --> KSTREAM-TRANSFORM-0000000004
      <-- KSTREAM-SOURCE-0000000000
    Processor: KSTREAM-TRANSFORM-0000000004 (stores: 
[performance_windowed_store])
      --> KSTREAM-MAPVALUES-0000000005
      <-- KSTREAM-FILTER-0000000002
    Processor: KSTREAM-MAPVALUES-0000000005 (stores: [])
      --> KSTREAM-FOREACH-0000000006
      <-- KSTREAM-TRANSFORM-0000000004
    Source: KSTREAM-SOURCE-0000000001 (topics: 
[pl.allegro.analytics.page_view_raw_by_pv_id])
      --> KSTREAM-PROCESSOR-0000000003
    Processor: KSTREAM-FOREACH-0000000006 (stores: [])
      --> none
      <-- KSTREAM-MAPVALUES-0000000005
    Processor: KSTREAM-PROCESSOR-0000000003 (stores: 
[performance_windowed_store])
      --> none
      <-- KSTREAM-SOURCE-0000000001{noformat}
1 state store (window size: 10 minutes, window retention: 11 minutes), topic 
pl.allegro.analytics.event_raw_by_pv_id: 16 partitions, retention: 3 hours; 
topic pl.allegro.analytics.page_view_raw_by_pv_id: 16 partitions, retention: 3 
hours

 

TOPOLOGY 4:
{noformat}
Topologies:
   Sub-topology: 0
    Source: KSTREAM-SOURCE-0000000000 (topics: 
[pl.allegro.analytics.event_raw_by_pv_id])
      --> KSTREAM-FILTER-0000000002
    Processor: KSTREAM-FILTER-0000000002 (stores: [])
      --> KSTREAM-TRANSFORM-0000000004
      <-- KSTREAM-SOURCE-0000000000
    Processor: KSTREAM-TRANSFORM-0000000004 (stores: 
[opbox_boxes_windowed_store])
      --> KSTREAM-MAPVALUES-0000000005
      <-- KSTREAM-FILTER-0000000002
    Processor: KSTREAM-MAPVALUES-0000000005 (stores: [])
      --> KSTREAM-FOREACH-0000000006
      <-- KSTREAM-TRANSFORM-0000000004
    Source: KSTREAM-SOURCE-0000000001 (topics: 
[pl.allegro.analytics.page_view_raw_by_pv_id])
      --> KSTREAM-PROCESSOR-0000000003
    Processor: KSTREAM-FOREACH-0000000006 (stores: [])
      --> none
      <-- KSTREAM-MAPVALUES-0000000005
    Processor: KSTREAM-PROCESSOR-0000000003 (stores: 
[opbox_boxes_windowed_store])
      --> none
      <-- KSTREAM-SOURCE-0000000001{noformat}
1 state store (window size: 10 minutes, window retention: 11 minutes), topic 
pl.allegro.analytics.event_raw_by_pv_id: 16 partitions, retention: 3 hours; 
topic pl.allegro.analytics.page_view_raw_by_pv_id: 16 partitions, retention: 3 
hours

 

Please let me know if there is anything else that could help you with the 
investigation.

Thanks!

 

 

> Non-heap memory leak in Kafka Streams
> -------------------------------------
>
>                 Key: KAFKA-8367
>                 URL: https://issues.apache.org/jira/browse/KAFKA-8367
>             Project: Kafka
>          Issue Type: Bug
>          Components: streams
>    Affects Versions: 2.2.0
>            Reporter: Pavel Savov
>            Priority: Major
>         Attachments: memory-prod.png, memory-test.png
>
>
> We have been observing a non-heap memory leak after upgrading to Kafka 
> Streams 2.2.0 from 2.0.1. We suspect the source to be around RocksDB as the 
> leak only happens when we enable stateful stream operations (utilizing 
> stores). We are aware of *KAFKA-8323* and have created our own fork of 2.2.0 
> and ported the fix scheduled for release in 2.2.1 to our fork. It did not 
> stop the leak, however.
> We are having this memory leak in our production environment where the 
> consumer group is auto-scaled in and out in response to changes in traffic 
> volume, and in our test environment where we have two consumers, no 
> autoscaling and relatively constant traffic.
> Below is some information I'm hoping will be of help:
>  * RocksDB Config:
> Block cache size: 4 MiB
> Write buffer size: 2 MiB
> Block size: 16 KiB
> Cache index and filter blocks: true
> Manifest preallocation size: 64 KiB
> Max write buffer number: 3
> Max open files: 6144
>  
>  * Memory usage in production
> The attached graph (memory-prod.png) shows memory consumption for each 
> instance as a separate line. The horizontal red line at 6 GiB is the memory 
> limit.
> As illustrated on the attached graph from production, memory consumption in 
> running instances goes up around autoscaling events (scaling the consumer 
> group either in or out) and associated rebalancing. It stabilizes until the 
> next autoscaling event but it never goes back down.
> An example of scaling out can be seen from around 21:00 hrs where three new 
> instances are started in response to a traffic spike.
> Just after midnight traffic drops and some instances are shut down. Memory 
> consumption in the remaining running instances goes up.
> Memory consumption climbs again from around 6:00AM due to increased traffic 
> and new instances are being started until around 10:30AM. Memory consumption 
> never drops until the cluster is restarted around 12:30.
>  
>  * Memory usage in test
> As illustrated by the attached graph (memory-test.png) we have a fixed number 
> of two instances in our test environment and no autoscaling. Memory 
> consumption rises linearly until it reaches the limit (around 2:00 AM on 
> 5/13) and Mesos restarts the offending instances, or we restart the cluster 
> manually.
>  
>  * No heap leaks observed
>  * Window retention: 2 or 11 minutes (depending on operation type)
>  * Issue not present in Kafka Streams 2.0.1
>  * No memory leak for stateless stream operations (when no RocksDB stores are 
> used)
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to