+1 On Mon, Apr 6, 2026 at 10:10 AM Anton Okolnychyi <[email protected]> wrote:
> +1 (non-binding) > > сб, 4 квіт. 2026 р. о 11:55 Gengliang Wang <[email protected]> пише: > >> +1 >> >> On Sat, Apr 4, 2026 at 10:17 AM Xiao Li <[email protected]> wrote: >> >>> +1 >>> >>> vaquar khan <[email protected]> 于2026年4月4日周六 09:45写道: >>> >>>> +1 >>>> >>>> Regards, >>>> Viquar Khan >>>> >>>> On Sat, 4 Apr 2026 at 11:14, Lisa N. Cao <[email protected]> >>>> wrote: >>>> >>>>> +1 (non-binding) >>>>> >>>>> -- >>>>> LNC >>>>> >>>>> On Fri, Apr 3, 2026, 5:15 PM Shixiong Zhu <[email protected]> wrote: >>>>> >>>>>> +1 >>>>>> >>>>>> >>>>>> On Fri, Apr 3, 2026 at 5:03 PM Mich Talebzadeh < >>>>>> [email protected]> wrote: >>>>>> >>>>>>> +1 >>>>>>> >>>>>>> Dr Mich Talebzadeh, >>>>>>> Data Scientist | Distributed Systems (Spark) | Financial Forensics & >>>>>>> Metadata Analytics | Transaction Reconstruction | Audit & Evidence-Based >>>>>>> Analytics >>>>>>> >>>>>>> view my Linkedin profile >>>>>>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Fri, 3 Apr 2026 at 23:00, Andreas Neumann <[email protected]> >>>>>>> wrote: >>>>>>> >>>>>>>> Hi Spark devs, >>>>>>>> >>>>>>>> I'd like to call a vote on the SPIP*: Auto CDC Support for Apache >>>>>>>> Spark* >>>>>>>> Motivation >>>>>>>> >>>>>>>> With the upcoming introduction of standardized CDC support >>>>>>>> <https://issues.apache.org/jira/browse/SPARK-55668>, Spark will >>>>>>>> soon have a unified way to produce change data feeds. However, >>>>>>>> consuming these feeds and applying them to a target table remains >>>>>>>> a significant challenge. >>>>>>>> >>>>>>>> Common patterns like SCD Type 1 (maintaining a 1:1 replica) and SCD >>>>>>>> Type 2 (tracking full change history) often require hand-crafted, >>>>>>>> complex MERGE logic. In distributed systems, these implementations >>>>>>>> are frequently error-prone when handling deletions or out-of-order >>>>>>>> data. >>>>>>>> Proposal >>>>>>>> >>>>>>>> This SPIP proposes a new "Auto CDC" flow type for Spark. It >>>>>>>> encapsulates the complex logic for SCD types and out-of-order data, >>>>>>>> allowing data engineers to configure a declarative flow instead of >>>>>>>> writing >>>>>>>> manual MERGE statements. This feature will be available in both Python >>>>>>>> and SQL. >>>>>>>> >>>>>>>> Example SQL: >>>>>>>> >>>>>>>> -- Produce a change feed >>>>>>>> >>>>>>>> CREATE STREAMING TABLE cdc.users AS >>>>>>>> >>>>>>>> SELECT * FROM STREAM my_table CHANGES FROM VERSION 10; >>>>>>>> >>>>>>>> >>>>>>>> -- Consume the change feed >>>>>>>> >>>>>>>> CREATE FLOW flow >>>>>>>> >>>>>>>> AS AUTO CDC INTO >>>>>>>> >>>>>>>> target >>>>>>>> >>>>>>>> FROM stream(cdc_data.users) >>>>>>>> >>>>>>>> KEYS (userId) >>>>>>>> >>>>>>>> APPLY AS DELETE WHEN operation = "DELETE" >>>>>>>> >>>>>>>> SEQUENCE BY sequenceNum >>>>>>>> >>>>>>>> COLUMNS * EXCEPT (operation, sequenceNum) >>>>>>>> >>>>>>>> STORED AS SCD TYPE 2 >>>>>>>> >>>>>>>> TRACK HISTORY ON * EXCEPT (city); >>>>>>>> >>>>>>>> >>>>>>>> *Relevant Links:* >>>>>>>> >>>>>>>> - SPIP Document: >>>>>>>> >>>>>>>> https://docs.google.com/document/d/1Hp5BGEYJRHbk6J7XUph3bAPZKRQXKOuV1PEaqZMMRoQ/ >>>>>>>> - >>>>>>>> >>>>>>>> *Discussion Thread: * >>>>>>>> https://lists.apache.org/thread/j6sj9wo9odgdpgzlxtvhoy7szs0jplf7 >>>>>>>> - >>>>>>>> >>>>>>>> JIRA: <https://issues.apache.org/jira/browse/SPARK-55668> >>>>>>>> https://issues.apache.org/jira/browse/SPARK-56249 >>>>>>>> >>>>>>>> *The vote will be open for at least 72 hours. *Please vote: >>>>>>>> >>>>>>>> [ ] +1: Accept the proposal as an official SPIP >>>>>>>> [ ] +0 >>>>>>>> [ ] -1: I don't think this is a good idea because ... >>>>>>>> Cheers -Andreas >>>>>>>> >>>>>>>> >>>>>>>>
