fightBoxing commented on PR #15282: URL: https://github.com/apache/iceberg/pull/15282#issuecomment-3956672031
> Thanks for the PR @fightBoxing! > > 1. Do you mind briefly describing the approach taken in this PR? I'm assuming this does some kind of merge-on-read. What is the general architecture? Are there any limitations to the approach taken in this PR? > 2. Could you remove all the files except for Flink 2.1? We usually merge support for the latest version first and then backport to the older ones. 1. CDC Functional Architecture Overview Methodology Used: Changelog Inference Based on Snapshot Metadata (Not Merge-on-Read) This PR is not a traditional merge-on-read solution. It uses a method of inferring changelogs based on Iceberg snapshot metadata. By analyzing the state changes (ADDED/DELETED) of manifest entries between two snapshots, it derives INSERT and DELETE change events and transforms them into a CDC stream with RowKind tags on the Flink side. <img width="923" height="1616" alt="Clipboard_Screenshot_1771993182" src="https://github.com/user-attachments/assets/aaf8d08a-be44-449a-b2ca-12227c4bdf95" /> -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
