YannByron commented on PR #5436:
URL: https://github.com/apache/hudi/pull/5436#issuecomment-1112830924

   > I learned that delta lake also extracted the CDF for processing. I think 
that the CDF can be extracted to better control the data and specifications in 
the CDC. I also read iceberg's design document that basically plagiarizes and 
reuses the existing logic of hudi . 
https://databricks.com/blog/2021/06/09/how-to-simplify-cdc-with-delta-lakes-change-data-feed.html
   
   Yep. This RFC design is basically following the Databricks DeltaLake CDF. 
Based on my thorough research on Deltalake, it is also about using as few CDC 
files as possible and reuse as many data files as possible.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to