YannByron commented on PR #5436: URL: https://github.com/apache/hudi/pull/5436#issuecomment-1112830924
> I learned that delta lake also extracted the CDF for processing. I think that the CDF can be extracted to better control the data and specifications in the CDC. I also read iceberg's design document that basically plagiarizes and reuses the existing logic of hudi . https://databricks.com/blog/2021/06/09/how-to-simplify-cdc-with-delta-lakes-change-data-feed.html Yep. This RFC design is basically following the Databricks DeltaLake CDF. Based on my thorough research on Deltalake, it is also about using as few CDC files as possible and reuse as many data files as possible. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
