chia7712 commented on code in PR #20913:
URL: https://github.com/apache/kafka/pull/20913#discussion_r3301880370


##########
storage/src/main/java/org/apache/kafka/server/log/remote/storage/RemoteLogManager.java:
##########
@@ -925,18 +927,73 @@ List<EnrichedLogSegment> candidateLogSegments(UnifiedLog 
log, Long fromOffset, L
             List<EnrichedLogSegment> candidateLogSegments = new ArrayList<>();
             List<LogSegment> segments = log.logSegments(fromOffset, 
Long.MAX_VALUE);
             if (!segments.isEmpty()) {
+                long currentTimeMs = time.milliseconds();
+                long totalLogSize = UnifiedLog.sizeInBytes(segments);
+                long cumulativeSize = 0;
                 for (int idx = 1; idx < segments.size(); idx++) {
                     LogSegment previousSeg = segments.get(idx - 1);
                     LogSegment currentSeg = segments.get(idx);
                     if (currentSeg.baseOffset() <= lastStableOffset) {
-                        candidateLogSegments.add(new 
EnrichedLogSegment(previousSeg, currentSeg.baseOffset()));
+                        cumulativeSize += previousSeg.size();
+                        if (isEligibleForUpload(log.config(), previousSeg, 
currentTimeMs, totalLogSize, cumulativeSize)) {
+                            candidateLogSegments.add(new 
EnrichedLogSegment(previousSeg, currentSeg.baseOffset()));
+                        } else {
+                            break;
+                        }
                     }
                 }
                 // Discard the last active segment
             }
             return candidateLogSegments;
         }
 
+        private boolean isEligibleForUpload(LogConfig logConfig, LogSegment 
previousSeg, long currentTimeMs, long totalLogSize, long cumulativeSize) {
+            long copyLagMs = logConfig.remoteCopyLagMs();
+            long copyLagBytes = logConfig.remoteCopyLagBytes();
+            if (logger.isTraceEnabled()) {
+                logger.trace("delayCopy check for segment {}: copyLagMs={}, 
copyLagBytes={}, currentTimeMs={}, totalLogSize={}, cumulativeSize={}, 
sizeLagBytes={}",
+                        previousSeg, copyLagMs, copyLagBytes, currentTimeMs, 
totalLogSize, cumulativeSize, totalLogSize - cumulativeSize);
+            }
+
+            if (copyLagMs == 0 || copyLagBytes == 0) {
+                return true;
+            }
+
+            boolean limitedCopyLagMsCheck =  copyLagMs > 0;
+            boolean limitedCopyLagSizeCheck = copyLagBytes > 0;
+
+            if (limitedCopyLagMsCheck && eligibleUploadByTime(previousSeg, 
currentTimeMs, copyLagMs)) {
+                return true;
+            }
+
+            return limitedCopyLagSizeCheck && 
eligibleUploadBySize(previousSeg, totalLogSize, cumulativeSize, copyLagBytes);
+        }
+
+        private boolean eligibleUploadByTime(LogSegment segment, long 
currentTimeMs, long copyLagMs) {
+            try {
+                long segmentAgeMs = currentTimeMs - segment.largestTimestamp();
+                boolean eligibleUpload = segmentAgeMs < 0 || segmentAgeMs >= 
copyLagMs;

Review Comment:
   I just realized that I replied to the wrong thread earlier, sorry about that 
 :cry: 
   
   > How about this solution? if we found it contain future record. we use 
LogSegment#lastModified to compare the time?
   
   Yes, that is an acceptable approach. I've opened 
https://issues.apache.org/jira/browse/KAFKA-20609 so we can keep discussing 
there.
   
   > yeah, this is not intuitive when compared with the other configs like 
retention time/bytes. My suggestion is to:
   
   While documentation is the final line of defense for users, it's better to 
have an intuitive design out of the box. After all, me proposed reverse 
configuration was rejected precisely because it wasn't intuitive enough
   
   Another way is to align the logic with retention.ms and retention.bytes. 
Since most users care more about time than size, we could set the default value 
of the time lag to 0, and the size lag to -1. This way, most users can just 
adjust the time setting without having to touch the size configuration. WDYT?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to