mjsax commented on code in PR #22378:
URL: https://github.com/apache/kafka/pull/22378#discussion_r3307889810


##########
streams/src/main/java/org/apache/kafka/streams/state/internals/RocksDBStore.java:
##########
@@ -316,14 +320,23 @@ private void addValueProvidersToMetricsRecorder() {
      * need the heavyweight options used for the data CF (large write buffers, 
bloom filters,
      * aggressive compaction). Sharing the data CF's options causes 
unnecessary write amplification
      * and compaction pressure that can contribute to RocksDB write stalls 
under heavy restore I/O.
+     *
+     * <p>The returned {@code ColumnFamilyOptions} is stored on {@link 
#offsetsCfOptions} so that
+     * {@link #close()} can release it. Constructing a {@code 
ColumnFamilyOptions} on the JNI side
+     * auto-allocates a default {@code BlockBasedTableFactory} and its {@code 
LRUCache}; if the
+     * options object is never closed, that native cache leaks. We also 
explicitly disable the block
+     * cache for this CF because the offsets metadata is tiny and benefits 
nothing from caching.

Review Comment:
   Similar question as above? -- Maybe not remove, but at least shorten?



##########
streams/src/main/java/org/apache/kafka/streams/state/internals/RocksDBStore.java:
##########
@@ -129,6 +129,10 @@ public class RocksDBStore implements KeyValueStore<Bytes, 
byte[]>, BatchWritingS
     private Cache cache;
     private BloomFilter filter;
     private Statistics statistics;
+    // Options for the offsets column family, kept on a field so close() can 
release the
+    // native resources (notably the BlockBasedTableFactory's default 
LRUCache) that the
+    // RocksDB C++ side auto-allocates when a ColumnFamilyOptions is 
constructed.

Review Comment:
   This comment seems overly verbose? Should we remove it entirely? 



##########
streams/src/main/java/org/apache/kafka/streams/state/internals/RocksDBStore.java:
##########
@@ -316,14 +320,23 @@ private void addValueProvidersToMetricsRecorder() {
      * need the heavyweight options used for the data CF (large write buffers, 
bloom filters,
      * aggressive compaction). Sharing the data CF's options causes 
unnecessary write amplification
      * and compaction pressure that can contribute to RocksDB write stalls 
under heavy restore I/O.
+     *
+     * <p>The returned {@code ColumnFamilyOptions} is stored on {@link 
#offsetsCfOptions} so that
+     * {@link #close()} can release it. Constructing a {@code 
ColumnFamilyOptions} on the JNI side
+     * auto-allocates a default {@code BlockBasedTableFactory} and its {@code 
LRUCache}; if the
+     * options object is never closed, that native cache leaks. We also 
explicitly disable the block
+     * cache for this CF because the offsets metadata is tiny and benefits 
nothing from caching.
      */
-    protected static ColumnFamilyOptions createOffsetsCFOptions() {
-        final ColumnFamilyOptions offsetsCFOptions = new ColumnFamilyOptions();
-        offsetsCFOptions.setCompressionType(CompressionType.NO_COMPRESSION);
-        offsetsCFOptions.setCompactionStyle(CompactionStyle.LEVEL);
-        offsetsCFOptions.setWriteBufferSize(1024 * 1024L); // 1MB — sufficient 
for offset metadata
-        offsetsCFOptions.setMaxWriteBufferNumber(2);
-        return offsetsCFOptions;
+    protected ColumnFamilyOptions createOffsetsCFOptions() {
+        offsetsCfOptions = new ColumnFamilyOptions();
+        offsetsCfOptions.setCompressionType(CompressionType.NO_COMPRESSION);
+        offsetsCfOptions.setCompactionStyle(CompactionStyle.LEVEL);
+        offsetsCfOptions.setWriteBufferSize(1024 * 1024L); // 1MB — sufficient 
for offset metadata
+        offsetsCfOptions.setMaxWriteBufferNumber(2);
+        final BlockBasedTableConfig offsetsTableConfig = new 
BlockBasedTableConfig();
+        offsetsTableConfig.setNoBlockCache(true);
+        offsetsCfOptions.setTableFormatConfig(offsetsTableConfig);

Review Comment:
   Seems this is newly added entirely? Can you elaborate why we need/want to 
set these `BlockBasedTableConfig` and why `setNoBlockCache(true)` ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to