Re: [PR] fix: incorrect CDC read from table with unfinished compaction [hudi]

via GitHub Tue, 16 Dec 2025 23:59:29 -0800


kamronis commented on code in PR #17607:
URL: https://github.com/apache/hudi/pull/17607#discussion_r2625987952



##########
hudi-common/src/main/java/org/apache/hudi/common/table/cdc/HoodieCDCExtractor.java:
##########
@@ -341,6 +341,20 @@ private Option<FileSlice> getDependentFileSliceForLogFile(
               .filter(logFile -> !logFile.equals(currentLogFileName))
               .map(logFile -> new StoragePath(partitionPath, logFile))
               .collect(Collectors.toList());
+          // get files list from unfinished compaction commit
+          List<StoragePath> filesToCompact = 
metaClient.getActiveTimeline().getInstants().stream().filter(
+                  i -> i.compareTo(instant) < 0 && !i.isCompleted() && 
i.getAction()
+                      .equals(HoodieActiveTimeline.COMPACTION_ACTION))

Review Comment:
   Hi @danny0405 !
   1) I don't think we should return only inflight compaction. We need 
incomplete instants (inflight and requested), that are older then instant we re 
on. In the test I use compaction is disabled and only scheduling happens. 
Scheduling is enough to cause bug.
   You are right about filtering by fgId, I added that to my commit filter.
   Now the first filter selects all the incomplete compaction instants and the 
second filters by fgId.
   If I understand compaction code correctly, we cannot have more then one 
compaction for same fgId at the same time, so no need to get max
   
   2) No need to include base file from compaction operation into the file 
slice, because base file or CDC file of that base file will be parsed correctly 
in parseWriteStat function on it's own instant. I added compaction as parameter 
for test to support that.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] fix: incorrect CDC read from table with unfinished compaction [hudi]

Reply via email to