TheR1sing3un commented on code in PR #14362:
URL: https://github.com/apache/hudi/pull/14362#discussion_r2563956940


##########
hudi-common/src/main/java/org/apache/hudi/common/model/FileSlice.java:
##########
@@ -105,6 +108,15 @@ public FileSlice withLogFiles(boolean includeLogFiles) {
     }
   }
 
+  public FileSlice filterLogFiles(SerializableFunctionUnchecked<HoodieLogFile, 
Boolean> filter) {

Review Comment:
   > Hi, @TheR1sing3un , can you add a UT in `TestFileSlice` for this new func.
   
   Done~Thank you for your optimization



##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/compact/plan/generators/BaseHoodieCompactionPlanGenerator.java:
##########
@@ -133,25 +132,23 @@ public HoodieCompactionPlan generateCompactionPlan(String 
compactionInstant) thr
         .getLatestFileSlicesStateless(partitionPath)
         .filter(slice -> filterFileSlice(slice, lastCompletedInstantTime, 
fgIdsInPendingCompactionAndClustering, instantRange))
         .map(s -> {
-          List<HoodieLogFile> logFiles = s.getLogFiles()
-              // ==============================================================
-              // IMPORTANT
-              // ==============================================================
-              // Currently, our filesystem view could return a file slice with 
pending log files there,
-              // these files should be excluded from the plan, let's say we 
have such a sequence of actions
-
-              // t10: a delta commit starts,
-              // t20: the compaction is scheduled and the t10 delta commit is 
still pending, and the "fg_10.log" is included in the plan
-              // t25: the delta commit 10 finishes,
-              // t30: the compaction execution starts, now the reader 
considers the log file "fg_10.log" as valid.
-
-              // for both OCC and NB-CC, this is in-correct.
-              .filter(logFile -> 
completionTimeQueryView.isCompletedBefore(compactionInstant, 
logFile.getDeltaCommitTime()))
-              .sorted(HoodieLogFile.getLogFileComparator()).collect(toList());
-          if (logFiles.isEmpty()) {
-            // compaction is not needed if there is no log file.
-            return null;
-          }
+          // ==============================================================
+          // IMPORTANT
+          // ==============================================================
+          // Currently, our filesystem view could return a file slice with 
pending log files there,
+          // these files should be excluded from the plan, let's say we have 
such a sequence of actions
+
+          // t10: a delta commit starts,
+          // t20: the compaction is scheduled and the t10 delta commit is 
still pending, and the "fg_10.log" is included in the plan
+          // t25: the delta commit 10 finishes,
+          // t30: the compaction execution starts, now the reader considers 
the log file "fg_10.log" as valid.
+
+          // for both OCC and NB-CC, this is in-correct.

Review Comment:
   The reason why I modified the indentation of the above comment is that 
mvn:checkstyle determined that the above indentation was incorrect:
   ![Uploading image.png…]()
   



##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/compact/plan/generators/BaseHoodieCompactionPlanGenerator.java:
##########
@@ -133,25 +132,23 @@ public HoodieCompactionPlan generateCompactionPlan(String 
compactionInstant) thr
         .getLatestFileSlicesStateless(partitionPath)
         .filter(slice -> filterFileSlice(slice, lastCompletedInstantTime, 
fgIdsInPendingCompactionAndClustering, instantRange))
         .map(s -> {
-          List<HoodieLogFile> logFiles = s.getLogFiles()
-              // ==============================================================
-              // IMPORTANT
-              // ==============================================================
-              // Currently, our filesystem view could return a file slice with 
pending log files there,
-              // these files should be excluded from the plan, let's say we 
have such a sequence of actions
-
-              // t10: a delta commit starts,
-              // t20: the compaction is scheduled and the t10 delta commit is 
still pending, and the "fg_10.log" is included in the plan
-              // t25: the delta commit 10 finishes,
-              // t30: the compaction execution starts, now the reader 
considers the log file "fg_10.log" as valid.
-
-              // for both OCC and NB-CC, this is in-correct.
-              .filter(logFile -> 
completionTimeQueryView.isCompletedBefore(compactionInstant, 
logFile.getDeltaCommitTime()))
-              .sorted(HoodieLogFile.getLogFileComparator()).collect(toList());
-          if (logFiles.isEmpty()) {
-            // compaction is not needed if there is no log file.
-            return null;
-          }
+          // ==============================================================
+          // IMPORTANT
+          // ==============================================================
+          // Currently, our filesystem view could return a file slice with 
pending log files there,
+          // these files should be excluded from the plan, let's say we have 
such a sequence of actions
+
+          // t10: a delta commit starts,
+          // t20: the compaction is scheduled and the t10 delta commit is 
still pending, and the "fg_10.log" is included in the plan
+          // t25: the delta commit 10 finishes,
+          // t30: the compaction execution starts, now the reader considers 
the log file "fg_10.log" as valid.
+
+          // for both OCC and NB-CC, this is in-correct.

Review Comment:
   The reason why I modified the indentation of the above comment is that 
mvn:checkstyle determined that the above indentation was incorrect:
   <img width="1369" height="55" alt="image" 
src="https://github.com/user-attachments/assets/4fa31d31-4915-4ad8-928a-745b91f52d00";
 />



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to