YannByron commented on code in PR #6256: URL: https://github.com/apache/hudi/pull/6256#discussion_r951078443
########## rfc/rfc-51/rfc-51.md: ########## @@ -148,20 +152,27 @@ hudi_cdc_table/ Under a partition directory, the `.log` file with `CDCBlock` above will keep the changing data we have to materialize. -There is an option to control what data is written to `CDCBlock`, that is `hoodie.table.cdc.supplemental.logging`. See the description of this config above. +#### Write-on-indexing vs Write-on-compaction Review Comment: OK, but one thing need to be noticed if persist mor's cdc data when compaction. @prasannarajaperumal @xushiyan give an example first: a record(id=1, name=x1) in base file, at t1 commit update name to x2 (in logFile1), at t2 commit update to x3(in logFile2). CDC should return two changing records (x1->x2, x2->x3). The current compaction implement will call `HoodieMergedLogRecordScanner` to get the log records first, then finish the compaction by `HoodieMergeHandler`. But the log records from `HoodieMergedLogRecordScanner` have already combined in advance, so that we lost some cdc info. So if we wanna persist cdc when compaction for mor tables, we have to upgrade these related coded: `HoodieMergedLogRecordScanner` to make it return non-combined records, `HoodieCompactor` and `HoodieMergeHandler` to adapt these changes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
