YannByron commented on code in PR #6256:
URL: https://github.com/apache/hudi/pull/6256#discussion_r951078443


##########
rfc/rfc-51/rfc-51.md:
##########
@@ -148,20 +152,27 @@ hudi_cdc_table/
 
 Under a partition directory, the `.log` file with `CDCBlock` above will keep 
the changing data we have to materialize.
 
-There is an option to control what data is written to `CDCBlock`, that is 
`hoodie.table.cdc.supplemental.logging`. See the description of this config 
above.
+#### Write-on-indexing vs Write-on-compaction

Review Comment:
   OK, but one thing need to be noticed if persist mor's cdc data when 
compaction. @prasannarajaperumal @xushiyan 
   give an example first: a record(id=1, name=x1) in base file, at t1 commit 
update  name to x2 (in logFile1), at t2 commit update to x3(in logFile2). CDC 
should return two changing records (x1->x2, x2->x3).
   The current compaction implement will call `HoodieMergedLogRecordScanner` to 
get the log records first, then finish the compaction by `HoodieMergeHandler`. 
But the log records from `HoodieMergedLogRecordScanner` have already combined 
in advance, so that we lost some cdc info.
   So if we wanna persist cdc when compaction for mor tables, we have to 
upgrade these related coded: `HoodieMergedLogRecordScanner` to make it return 
non-combined records, `HoodieCompactor` and `HoodieMergeHandler` to adapt these 
changes.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to