[ 
https://issues.apache.org/jira/browse/KAFKA-20616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bill Bejeck updated KAFKA-20616:
--------------------------------
    Description: 
Primary leak (KAFKA-20456 follow-up). RocksDBStore.createOffsetsCFOptions() 
returns a new ColumnFamilyOptions() that is passed to a ColumnFamilyDescriptor 
and then dropped — it is never assigned to a field
   and never closed. On the JNI side, constructing a ColumnFamilyOptions 
auto-allocates a default BlockBasedTableFactory with an 8 MB LRUCache. The  
leak compounds per segment, per task — windowed/segmented stores amplify it 
heavily.                                                                        
                                                                                
                                                                                
                             
  Secondary leak (KIP-1035 close path). AbstractColumnFamilyAccessor.close() 
writes a closedState marker to the offsets CF; if that write throws (which 
happens during the EOSv2 cascade or unclean shutdown — 
  a case the existing code comment already acknowledges), the subsequent 
offsetColumnFamilyHandle.close() is skipped. SingleColumnFamilyAccessor.close() 
and DualColumnFamilyAccessor.close() have the same    
  non-finally ordering, so the data CF (and oldCF/newCF for migrating stores) 
handles also leak whenever super.close() propagates. RocksDBStore.close() 
swallows the resulting RocksDBException, so the leak is silent. 

  was:
Primary leak (KAFKA-20456 follow-up). RocksDBStore.createOffsetsCFOptions() 
returns a new ColumnFamilyOptions() that is passed to a ColumnFamilyDescriptor 
and then dropped — it is never assigned to a field
   and never closed. On the JNI side, constructing a ColumnFamilyOptions 
auto-allocates a default BlockBasedTableFactory with an 8 MB LRUCache. Native 
heap profiles from the soak confirm this directly:      
  Java_org_rocksdb_ColumnFamilyOptions_newColumnFamilyOptions → 
BlockBasedTableFactory::InitializeOptions → LRUCacheOptions::MakeSharedCache 
accounts for 5.5 GB (70%) on soak1 and 2.6 GB (54%) on soak2. The 
  leak compounds per segment, per task — windowed/segmented stores amplify it 
heavily.                                                                        
                                                 
                                                                                
                                                                                
                                               
  Secondary leak (KIP-1035 close path). AbstractColumnFamilyAccessor.close() 
writes a closedState marker to the offsets CF; if that write throws (which 
happens during the EOSv2 cascade or unclean shutdown — 
  a case the existing code comment already acknowledges), the subsequent 
offsetColumnFamilyHandle.close() is skipped. SingleColumnFamilyAccessor.close() 
and DualColumnFamilyAccessor.close() have the same    
  non-finally ordering, so the data CF (and oldCF/newCF for migrating stores) 
handles also leak whenever super.close() propagates. RocksDBStore.close() 
swallows the resulting RocksDBException, so the leak is
   silent. 


> Close-path leaks in RocksDBStore cause native memory growth that eventually 
> leads to OOM
> ----------------------------------------------------------------------------------------
>
>                 Key: KAFKA-20616
>                 URL: https://issues.apache.org/jira/browse/KAFKA-20616
>             Project: Kafka
>          Issue Type: Bug
>          Components: streams
>    Affects Versions: 4.3.0, 4.4.0
>            Reporter: Bill Bejeck
>            Assignee: Bill Bejeck
>            Priority: Blocker
>             Fix For: 4.3.1, 4.4.0
>
>
> Primary leak (KAFKA-20456 follow-up). RocksDBStore.createOffsetsCFOptions() 
> returns a new ColumnFamilyOptions() that is passed to a 
> ColumnFamilyDescriptor and then dropped — it is never assigned to a field
>    and never closed. On the JNI side, constructing a ColumnFamilyOptions 
> auto-allocates a default BlockBasedTableFactory with an 8 MB LRUCache. The  
> leak compounds per segment, per task — windowed/segmented stores amplify it 
> heavily.                                                                      
>                                                                               
>                                                                               
>                                    
>   Secondary leak (KIP-1035 close path). AbstractColumnFamilyAccessor.close() 
> writes a closedState marker to the offsets CF; if that write throws (which 
> happens during the EOSv2 cascade or unclean shutdown — 
>   a case the existing code comment already acknowledges), the subsequent 
> offsetColumnFamilyHandle.close() is skipped. 
> SingleColumnFamilyAccessor.close() and DualColumnFamilyAccessor.close() have 
> the same    
>   non-finally ordering, so the data CF (and oldCF/newCF for migrating stores) 
> handles also leak whenever super.close() propagates. RocksDBStore.close() 
> swallows the resulting RocksDBException, so the leak is silent. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to