Hi Vaibhav, The goal of this proposal is not to replace MERGE but to provide a simple abstraction for the common use case of CDC. MERGE itself is a very powerful operator and there will always be use cases outside of CDC that will require MERGE.
And thanks for spotting the typo in the SPIP. It is fixed now! Cheers -Andreas On Fri, Mar 27, 2026 at 10:53 AM Vaibhav Kumar <[email protected]> wrote: > Hi Andrew, > > Thanks for sharing the SPIP, Does that mean the MERGE statement would be > deprecated? Also I think there was a small typo I have suggested in the > doc. > > Regards, > Vaibhav > > On Fri, Mar 27, 2026 at 10:15 AM DB Tsai <[email protected]> wrote: > >> +1 >> >> DB Tsai | https://www.dbtsai.com/ | PGP 42E5B25A8F7A82C1 >> >> On Mar 26, 2026, at 6:08 PM, Andreas Neumann <[email protected]> wrote: >> >> Hi all, >> >> I’d like to start a discussion on a new SPIP to introduce Auto CDC >> support to Apache Spark. >> >> - SPIP Document: >> >> https://docs.google.com/document/d/1Hp5BGEYJRHbk6J7XUph3bAPZKRQXKOuV1PEaqZMMRoQ/ >> - >> >> JIRA: <https://issues.apache.org/jira/browse/SPARK-55668> >> https://issues.apache.org/jira/browse/SPARK-5566 >> >> Motivation >> >> With the upcoming introduction of standardized CDC support >> <https://issues.apache.org/jira/browse/SPARK-55668>, Spark will soon >> have a unified way to produce change data feeds. However, consuming >> these feeds and applying them to a target table remains a significant >> challenge. >> >> Common patterns like SCD Type 1 (maintaining a 1:1 replica) and SCD Type >> 2 (tracking full change history) often require hand-crafted, complex >> MERGE logic. In distributed systems, these implementations are >> frequently error-prone when handling deletions or out-of-order data. >> Proposal >> >> This SPIP proposes a new "Auto CDC" flow type for Spark. It encapsulates >> the complex logic for SCD types and out-of-order data, allowing data >> engineers to configure a declarative flow instead of writing manual MERGE >> statements. This feature will be available in both Python and SQL. >> Example SQL: >> -- Produce a change feed >> CREATE STREAMING TABLE cdc.users AS >> SELECT * FROM STREAM my_table CHANGES FROM VERSION 10; >> >> -- Consume the change feed >> CREATE FLOW flow >> AS AUTO CDC INTO >> target >> FROM stream(cdc_data.users) >> KEYS (userId) >> APPLY AS DELETE WHEN operation = "DELETE" >> SEQUENCE BY sequenceNum >> COLUMNS * EXCEPT (operation, sequenceNum) >> STORED AS SCD TYPE 2 >> TRACK HISTORY ON * EXCEPT (city); >> >> >> Please review the full SPIP for the technical details. Looking forward to >> your feedback and discussion! >> >> Best regards, >> >> Andreas >> >> >>
