I think that one is only for Delta tables - I mean something more general with multiple pluggable sources- like Flink cdc - supporting cdc for sql server, mysql, postgresql delta and iceberg for starter. I think probably processing them with something like Spark structured streaming - supporting cdc for various data sources and general databases.
While Iceberg and Delta can be read from various engines, other data sources like mysql, sql server etc. can't- so to share such tables one need to have an easy way to transform those tables to Iceberg/ Delta for data lakes (you can't read it all the time from the operational database). Thanks, Nimrod בתאריך שבת, 28 בפבר׳ 2026, 15:54, מאת Ángel Álvarez Pascua < [email protected]>: > You mean something like AutoCDC from Databricks? > https://docs.databricks.com/aws/en/ldp/cdc > > El sáb, 28 feb 2026, 10:47, Nimrod Ofek <[email protected]> escribió: > >> Hi all, >> >> I would like to start a discussion about the possibility for >> implementation of a Change Data Capture (CDC) feature within Apache Spark, >> similar to the existing, competing Flink CDC functionality >> <https://nightlies.apache.org/flink/flink-cdc-docs-master/docs/connectors/flink-sources/overview/> >> . >> >> I believe integrating such a feature would significantly enhance Spark's >> capabilities for real-time data integration and ETL processes. I would >> appreciate the opportunity to discuss how we might approach this proposal. >> >> Thank you for your time and consideration. >> >> Best regards, >> Nimrod >> >>
