Hi Nimrod, For your awareness, I have opened a discussion thread on the mailing list. You can find it here: https://lists.apache.org/thread/dhxx6pohs7fvqc3knzhtoj4tbcgrwxts
On Sat, Feb 28, 2026 at 6:39 AM Ángel Álvarez Pascua < [email protected]> wrote: > I fully agree with your idea. A general, pluggable CDC framework in Spark > would fill a real gap for integrating operational databases with lakehouse > formats using Structured Streaming. > > I also believe it should integrate seamlessly with declarative pipelines, > allowing users to declare intent (source, tables, sink, apply semantics) > while Spark manages the underlying streaming jobs. > > El sáb, 28 feb 2026, 15:20, Nimrod Ofek <[email protected]> escribió: > >> I think that one is only for Delta tables - I mean something more general >> with multiple pluggable sources- like Flink cdc - supporting cdc for sql >> server, mysql, postgresql delta and iceberg for starter. >> I think probably processing them with something like Spark structured >> streaming - supporting cdc for various data sources and general databases. >> >> While Iceberg and Delta can be read from various engines, other data >> sources like mysql, sql server etc. can't- so to share such tables one need >> to have an easy way to transform those tables to Iceberg/ Delta for data >> lakes (you can't read it all the time from the operational database). >> >> Thanks, >> Nimrod >> >> בתאריך שבת, 28 בפבר׳ 2026, 15:54, מאת Ángel Álvarez Pascua < >> [email protected]>: >> >>> You mean something like AutoCDC from Databricks? >>> https://docs.databricks.com/aws/en/ldp/cdc >>> >>> El sáb, 28 feb 2026, 10:47, Nimrod Ofek <[email protected]> >>> escribió: >>> >>>> Hi all, >>>> >>>> I would like to start a discussion about the possibility for >>>> implementation of a Change Data Capture (CDC) feature within Apache Spark, >>>> similar to the existing, competing Flink CDC functionality >>>> <https://nightlies.apache.org/flink/flink-cdc-docs-master/docs/connectors/flink-sources/overview/> >>>> . >>>> >>>> I believe integrating such a feature would significantly enhance >>>> Spark's capabilities for real-time data integration and ETL processes. I >>>> would appreciate the opportunity to discuss how we might approach this >>>> proposal. >>>> >>>> Thank you for your time and consideration. >>>> >>>> Best regards, >>>> Nimrod >>>> >>>>
