[GitHub] [hudi] YannByron commented on a diff in pull request #6256: [RFC-51][HUDI-3478] Update RFC: CDC support

GitBox Tue, 16 Aug 2022 20:25:54 -0700


YannByron commented on code in PR #6256:
URL: https://github.com/apache/hudi/pull/6256#discussion_r947417129



##########
rfc/rfc-51/rfc-51.md:
##########
@@ -148,20 +152,27 @@ hudi_cdc_table/
 
 Under a partition directory, the `.log` file with `CDCBlock` above will keep 
the changing data we have to materialize.
 
-There is an option to control what data is written to `CDCBlock`, that is 
`hoodie.table.cdc.supplemental.logging`. See the description of this config 
above.
+#### Write-on-indexing vs Write-on-compaction

Review Comment:
   The valid range of cdc query is same to `timetravel`'s, only the instants in 
active timeline can be queried.
   
   1. Only c3 can be queried. But if c3 needs to load the previous file slice 
that has been cleaned, throw exceptions. Because part of cdc data from c3 is 
lost.
   2.  Unless the files that is needed to be loaded instants in current active 
timeline have already been cleaned, no exception will be thrown. That means 
some cdc data from the cleaned instants will be lost. The behavior is like that 
sync the binlog from mysql to kafka. If binlog is archived before they are 
synced to kafka, they can't be seen in kafka.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] YannByron commented on a diff in pull request #6256: [RFC-51][HUDI-3478] Update RFC: CDC support

Reply via email to