nsivabalan commented on code in PR #10915:
URL: https://github.com/apache/hudi/pull/10915#discussion_r1571247566


##########
hudi-common/src/main/java/org/apache/hudi/common/table/cdc/HoodieCDCExtractor.java:
##########
@@ -114,6 +114,24 @@ public Map<HoodieFileGroupId, List<HoodieCDCFileSplit>> 
extractCDCFileSplits() {
     ValidationUtils.checkState(commits != null, "Empty commits");
 
     Map<HoodieFileGroupId, List<HoodieCDCFileSplit>> fgToCommitChanges = new 
HashMap<>();
+

Review Comment:
   nope. I am saying HoodieCDCExtractor.java has some inherent bug which I am 
trying to fix here. 
   
   lets say timeline is as follows
   
   dc1
   dc2
   rc3
   dc4
   clean5 // cleans up data files from dc1 and dc2 since it was replaced by 
rc3. 
   
   as per master, HoodieCDCExtractor goes over commit metadata in 
activetimeline and tries to deduce base files for log files found. In this 
case, all data files from dc1 and dc2 are already deleted by clean5. And so we 
might hit file not found issue as per master. 
   
   
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to