Re: Proposal for Change Data Capture with Spark

Nimrod Ofek Sat, 28 Feb 2026 06:20:08 -0800

I think that one is only for Delta tables - I mean something more general
with multiple pluggable sources- like Flink cdc - supporting cdc for sql
server, mysql, postgresql delta and iceberg for starter.
I think probably processing them with something like Spark structured
streaming - supporting cdc for various data sources and general databases.


While Iceberg and Delta can be read from various engines, other data
sources like mysql, sql server etc. can't- so to share such tables one need
to have an easy way to transform those tables to Iceberg/ Delta for data
lakes (you can't read it all the time from the operational database).

Thanks,
Nimrod

בתאריך שבת, 28 בפבר׳ 2026, 15:54, מאת Ángel Álvarez Pascua ‏<
[email protected]>:

> You mean something like AutoCDC from Databricks?
> https://docs.databricks.com/aws/en/ldp/cdc
>
> El sáb, 28 feb 2026, 10:47, Nimrod Ofek <[email protected]> escribió:
>
>> Hi all,
>>
>> I would like to start a discussion about the possibility for
>> implementation of a Change Data Capture (CDC) feature within Apache Spark,
>> similar to the existing, competing Flink CDC functionality
>> <https://nightlies.apache.org/flink/flink-cdc-docs-master/docs/connectors/flink-sources/overview/>
>> .
>>
>> I believe integrating such a feature would significantly enhance Spark's
>> capabilities for real-time data integration and ETL processes. I would
>> appreciate the opportunity to discuss how we might approach this proposal.
>>
>> Thank you for your time and consideration.
>>
>> Best regards,
>> Nimrod
>>
>>

Re: Proposal for Change Data Capture with Spark

Reply via email to