nsivabalan commented on code in PR #10915:
URL: https://github.com/apache/hudi/pull/10915#discussion_r1571247566
##########
hudi-common/src/main/java/org/apache/hudi/common/table/cdc/HoodieCDCExtractor.java:
##########
@@ -114,6 +114,24 @@ public Map<HoodieFileGroupId, List<HoodieCDCFileSplit>>
extractCDCFileSplits() {
ValidationUtils.checkState(commits != null, "Empty commits");
Map<HoodieFileGroupId, List<HoodieCDCFileSplit>> fgToCommitChanges = new
HashMap<>();
+
Review Comment:
nope. I am saying HoodieCDCExtractor.java has some inherent bug which I am
trying to fix here.
lets say timeline is as follows
dc1
dc2
rc3
dc4
clean5 // cleans up data files from dc1 and dc2 since it was replaced by
rc3.
as per master, HoodieCDCExtractor goes over commit metadata in
activetimeline and tries to deduce base files for log files found. In this
case, all data files from dc1 and dc2 are already deleted by clean5. And so we
might hit file not found issue as per master.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]