nsivabalan commented on code in PR #10915:
URL: https://github.com/apache/hudi/pull/10915#discussion_r1568072027


##########
hudi-common/src/main/java/org/apache/hudi/common/table/cdc/HoodieCDCExtractor.java:
##########
@@ -114,6 +114,24 @@ public Map<HoodieFileGroupId, List<HoodieCDCFileSplit>> 
extractCDCFileSplits() {
     ValidationUtils.checkState(commits != null, "Empty commits");
 
     Map<HoodieFileGroupId, List<HoodieCDCFileSplit>> fgToCommitChanges = new 
HashMap<>();
+

Review Comment:
   hey @danny0405 @yihua : Can you folks rope in someone to assist w/ CDC fixes 
here. 
   I am not sure if this is the right fix. 
   
   but let me go over the gist. 
   There was a test failure which I was trying to chase and realized we might 
have some gaps here. 
   
   I am not going go to over exact scenario that the test failed. but the gist 
is this.
   
   dc1
   dc2
   rc3
   dc4
   clean5
   
   in this case, we are looking to parse and fetch base files for dc1 and dc2 
as well which could have been cleaned up by the cleaner and may not exists. 
   
   Before this fix, the test was set up such that, timeline was 
   rc3
   dc4
   clean5
   
   and so we did not hit this issue. When I made the fix to ensure 
getLatestDeltaCommits does the right thing (which archival is using btw), we 
ended up w/ timeline as 
   dc1
   dc2
   rc3
   dc4
   clean5
   
   and   
   ```
   private HoodieCDCFileSplit parseWriteStat(
         HoodieFileGroupId fileGroupId,
         HoodieInstant instant,
         HoodieWriteStat writeStat,
         WriteOperationType operation) {
   ``` 
   in HoodieCDCExtractor is failing. 
   
   
   
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to