Re: SPIP: Auto CDC support for Apache Spark

DB Tsai Thu, 26 Mar 2026 21:45:07 -0700

+1

DB Tsai  |  https://www.dbtsai.com/  |  PGP 42E5B25A8F7A82C1


> On Mar 26, 2026, at 6:08 PM, Andreas Neumann <[email protected]> wrote:
> 
> Hi all,
> 
> I’d like to start a discussion on a new SPIP to introduce Auto CDC support to 
> Apache Spark.
> 
> SPIP Document: 
> https://docs.google.com/document/d/1Hp5BGEYJRHbk6J7XUph3bAPZKRQXKOuV1PEaqZMMRoQ/
> JIRA:  
> <https://issues.apache.org/jira/browse/SPARK-55668>https://issues.apache.org/jira/browse/SPARK-5566
> 
> Motivation
> With the upcoming introduction of standardized CDC support 
> <https://issues.apache.org/jira/browse/SPARK-55668>, Spark will soon have a 
> unified way to produce change data feeds. However, consuming these feeds and 
> applying them to a target table remains a significant challenge.
> 
> Common patterns like SCD Type 1 (maintaining a 1:1 replica) and SCD Type 2 
> (tracking full change history) often require hand-crafted, complex MERGE 
> logic. In distributed systems, these implementations are frequently 
> error-prone when handling deletions or out-of-order data.
> 
> Proposal
> This SPIP proposes a new "Auto CDC" flow type for Spark. It encapsulates the 
> complex logic for SCD types and out-of-order data, allowing data engineers to 
> configure a declarative flow instead of writing manual MERGE statements. This 
> feature will be available in both Python and SQL.
> 
> Example SQL:
> -- Produce a change feed
> CREATE STREAMING TABLE cdc.users AS
> SELECT * FROM STREAM my_table CHANGES FROM VERSION 10;
> 
> -- Consume the change feed
> CREATE FLOW flow
> AS AUTO CDC INTO
>   target
> FROM stream(cdc_data.users)
>   KEYS (userId)
>   APPLY AS DELETE WHEN operation = "DELETE"
>   SEQUENCE BY sequenceNum
>   COLUMNS * EXCEPT (operation, sequenceNum)
>   STORED AS SCD TYPE 2
>   TRACK HISTORY ON * EXCEPT (city);
> 
> Please review the full SPIP for the technical details. Looking forward to 
> your feedback and discussion!
> 
> Best regards,
> 
> Andreas
> 
>

signature.asc
Description: Message signed with OpenPGP

Re: SPIP: Auto CDC support for Apache Spark

Reply via email to