nsivabalan commented on code in PR #10915:
URL: https://github.com/apache/hudi/pull/10915#discussion_r1568072027
##########
hudi-common/src/main/java/org/apache/hudi/common/table/cdc/HoodieCDCExtractor.java:
##########
@@ -114,6 +114,24 @@ public Map<HoodieFileGroupId, List<HoodieCDCFileSplit>>
extractCDCFileSplits() {
ValidationUtils.checkState(commits != null, "Empty commits");
Map<HoodieFileGroupId, List<HoodieCDCFileSplit>> fgToCommitChanges = new
HashMap<>();
+
Review Comment:
hey @danny0405 @yihua : Can you folks rope in someone to assist w/ CDC fixes
here.
I am not sure if this is the right fix.
but let me go over the gist.
There was a test failure which I was trying to chase and realized we might
have some gaps here.
I am not going go to over exact scenario that the test failed. but the gist
is this.
dc1
dc2
rc3
dc4
clean5
in this case, we are looking to parse and fetch base files for dc1 and dc2
as well which could have been cleaned up by the cleaner and may not exists.
Before this fix, the test was set up such that, timeline was
rc3
dc4
clean5
and so we did not hit this issue. When I made the fix to ensure
getLatestDeltaCommits does the right thing (which archival is using btw), we
ended up w/ timeline as
dc1
dc2
rc3
dc4
clean5
and private HoodieCDCFileSplit parseWriteStat(
HoodieFileGroupId fileGroupId,
HoodieInstant instant,
HoodieWriteStat writeStat,
WriteOperationType operation) {
in HoodieCDCExtractor is failing.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]