Hey everyone, I’d like to share a design proposal for a new Iceberg Incremental CDC Source in Apache Beam:
*Design Doc*: https://docs.google.com/document/d/1_W6nDpiHKCk2oKrs-ICBn5IEeYm1AXhE0ItaK2-rrng *Draft PR*: https://github.com/apache/beam/pull/37191 Currently, Beam’s IcebergIO supports streaming reads for append-only snapshots. This proposal introduces a native streaming source capable of processing full CDC events (inserts, updates, and deletes) using Iceberg’s IncrementalChangelogScan API. The doc has an initial intro to Iceberg CDC then jumps into some performance optimizations, specifically using snapshot, partition, and file metadata to bypass expensive shuffles when possible. Would appreciate any feedback or thoughts on this approach! Thanks, Ahmed Abualsaud
