kamronis commented on code in PR #17607:
URL: https://github.com/apache/hudi/pull/17607#discussion_r2625987952
##########
hudi-common/src/main/java/org/apache/hudi/common/table/cdc/HoodieCDCExtractor.java:
##########
@@ -341,6 +341,20 @@ private Option<FileSlice> getDependentFileSliceForLogFile(
.filter(logFile -> !logFile.equals(currentLogFileName))
.map(logFile -> new StoragePath(partitionPath, logFile))
.collect(Collectors.toList());
+ // get files list from unfinished compaction commit
+ List<StoragePath> filesToCompact =
metaClient.getActiveTimeline().getInstants().stream().filter(
+ i -> i.compareTo(instant) < 0 && !i.isCompleted() &&
i.getAction()
+ .equals(HoodieActiveTimeline.COMPACTION_ACTION))
Review Comment:
Hi @danny0405 !
1) I don't think we should return only inflight compaction. We need
incomplete instants (inflight and requested), that are older then instant we re
on. In the test I use compaction is disabled and only scheduling happens.
Scheduling is enough to cause bug.
You are right about filtering by fgId, I added that to my commit filter.
Now the first filter selects all the incomplete compaction instants and the
second filters by fgId.
If I understand compaction code correctly, we cannot have more then one
compaction for same fgId at the same time, so no need to get max
2) No need to include base file from compaction operation into the file
slice, because base file or CDC file of that base file will be parsed correctly
in parseWriteStat function on it's own instant. I added compaction as parameter
for test to support that.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]