leekeiabstraction opened a new pull request, #28271:
URL: https://github.com/apache/flink/pull/28271

   Backport of #28251 to `release-1.20`.
   
   ## What is the purpose of the change
   
   Fixes a native memory leak in the RocksDB SST merge `Compactor`. 
`ColumnFamilyHandle.getDescriptor()` copies the column family's options across 
JNI and returns a fresh native `ColumnFamilyOptions` on every call. 
`Compactor.compact()` read `numLevels()` from it but never closed it, so the 
native object leaked on every compaction. Because the leaked options retain a 
reference to the shared block cache (via `BlockBasedTableFactory` -> 
`BlockBasedTableOptions` -> `LRUCache`), the cache's `shared_ptr` is never 
released, preventing the block cache from being freed even after all tasks 
stop. This causes task manager RSS to grow and eventually OOM.
   
   ## Brief change log
   
     - Wrap `cfName.getDescriptor().getOptions()` in a try-with-resources block 
in `Compactor.compact()` so the native `ColumnFamilyOptions` is closed after 
`numLevels()` is read.
   
   ## Verifying this change
   
   The leak and the fix were verified with jemalloc profiling (`jeprof`), 
running Flink in session mode and repeatedly starting/stopping jobs to trigger 
the compactor while tracking the `rocksdb::BlockFetcher::ReadBlockContents` 
call stack that dominates block-cache allocations. The configured block cache 
capacity was 833MB.
   
     - **Before the fix:** after jobs were stopped and resubmitted, 
`ReadBlockContents` grew to ~1.54GB, far exceeding the 833MB cache capacity; 
the jemalloc heap profile reported a total of 2,280,777,636 bytes.
     - **After the fix:** the task manager with the highest RSS held ~800MB in 
`ReadBlockContents`, consistent with the 833MB capacity; the jemalloc heap 
profile reported a total of 1,416,132,765 bytes.
   
   This is a ~37% reduction in native memory usage and eliminates the 
cache-capacity overage, confirming the `LRUCache` leak caused by the unclosed 
`ColumnFamilyOptions` is resolved. The behavior (output level computation) is 
unchanged and is covered by existing tests; only the previously-leaked native 
handle is now closed.
   
   ## Does this pull request potentially affect one of the following parts:
   
     - Dependencies (does it add or upgrade a dependency): no
     - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: no
     - The serializers: no
     - The runtime per-record code paths (performance sensitive): no
     - Anything that affects deployment or recovery: no
     - The S3 file system connector: no
   
   ## Documentation
   
     - Does this pull request introduce a new feature? no
     - If yes, how is the feature documented? not applicable


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to