junrao commented on code in PR #21379:
URL: https://github.com/apache/kafka/pull/21379#discussion_r2800694490


##########
storage/src/main/java/org/apache/kafka/storage/internals/log/Cleaner.java:
##########
@@ -169,9 +176,17 @@ public Map.Entry<Long, CleanerStats> doClean(LogToClean 
cleanable, long currentT
                 log.name(), new Date(cleanableHorizonMs), new 
Date(legacyDeleteHorizonMs));
         CleanedTransactionMetadata transactionMetadata = new 
CleanedTransactionMetadata();
 
+        double sizeRatio = 
segmentOverflowPartitions.getOrDefault(log.topicPartition(), 1.0);
+        if (sizeRatio != 1.0) {
+            logger.info("Partition {} has overflow history. " + "Reducing 
effective segment size to {}% for this round.",
+                    log.topicPartition(), sizeRatio * 100);
+        }
+
+        int effectiveMaxSize = (int) (log.config().segmentSize() * sizeRatio);
+
         List<List<LogSegment>> groupedSegments = groupSegmentsBySize(
                 log.logSegments(0, endOffset),
-                log.config().segmentSize(),
+                effectiveMaxSize,

Review Comment:
   That approach could work, but one has to guess the size to split the 
segments into. Have you considered the alternative of creating multiple cleaned 
segments? log.replaceSegments() already supports replacing multiple segments. 
If cleanInto() hits a file overflow exception, we could close the current 
cleaned segment, create a new one and continue the cleaning.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to