Re: [PR] KAFKA-20036 Handle LogCleaner segment overflow caused by compression level changes [kafka]

via GitHub Fri, 13 Feb 2026 04:27:48 -0800


m1a2st commented on code in PR #21379:
URL: https://github.com/apache/kafka/pull/21379#discussion_r2803957651



##########
storage/src/main/java/org/apache/kafka/storage/internals/log/Cleaner.java:
##########
@@ -169,9 +176,17 @@ public Map.Entry<Long, CleanerStats> doClean(LogToClean 
cleanable, long currentT
                 log.name(), new Date(cleanableHorizonMs), new 
Date(legacyDeleteHorizonMs));
         CleanedTransactionMetadata transactionMetadata = new 
CleanedTransactionMetadata();
 
+        double sizeRatio = 
segmentOverflowPartitions.getOrDefault(log.topicPartition(), 1.0);
+        if (sizeRatio != 1.0) {
+            logger.info("Partition {} has overflow history. " + "Reducing 
effective segment size to {}% for this round.",
+                    log.topicPartition(), sizeRatio * 100);
+        }
+
+        int effectiveMaxSize = (int) (log.config().segmentSize() * sizeRatio);
+
         List<List<LogSegment>> groupedSegments = groupSegmentsBySize(
                 log.logSegments(0, endOffset),
-                log.config().segmentSize(),
+                effectiveMaxSize,

Review Comment:
   I think we should retain `groupSegmentsBySize()` to control temporary disk 
usage, and handle overflow dynamically within `cleanInto()` by creating 
multiple cleaned segments as needed.
   
   This approach allows us to avoid disk space issues while still handling 
segment overflow gracefully. The peak disk usage remains bounded by the group 
size rather than the total log size.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] KAFKA-20036 Handle LogCleaner segment overflow caused by compression level changes [kafka]

Reply via email to