+1 On Sat, Apr 4, 2026 at 10:17 AM Xiao Li <[email protected]> wrote:
> +1 > > vaquar khan <[email protected]> 于2026年4月4日周六 09:45写道: > >> +1 >> >> Regards, >> Viquar Khan >> >> On Sat, 4 Apr 2026 at 11:14, Lisa N. Cao <[email protected]> >> wrote: >> >>> +1 (non-binding) >>> >>> -- >>> LNC >>> >>> On Fri, Apr 3, 2026, 5:15 PM Shixiong Zhu <[email protected]> wrote: >>> >>>> +1 >>>> >>>> >>>> On Fri, Apr 3, 2026 at 5:03 PM Mich Talebzadeh < >>>> [email protected]> wrote: >>>> >>>>> +1 >>>>> >>>>> Dr Mich Talebzadeh, >>>>> Data Scientist | Distributed Systems (Spark) | Financial Forensics & >>>>> Metadata Analytics | Transaction Reconstruction | Audit & Evidence-Based >>>>> Analytics >>>>> >>>>> view my Linkedin profile >>>>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> On Fri, 3 Apr 2026 at 23:00, Andreas Neumann <[email protected]> wrote: >>>>> >>>>>> Hi Spark devs, >>>>>> >>>>>> I'd like to call a vote on the SPIP*: Auto CDC Support for Apache >>>>>> Spark* >>>>>> Motivation >>>>>> >>>>>> With the upcoming introduction of standardized CDC support >>>>>> <https://issues.apache.org/jira/browse/SPARK-55668>, Spark will soon >>>>>> have a unified way to produce change data feeds. However, consuming these >>>>>> feeds and applying them to a target table remains a significant >>>>>> challenge. >>>>>> >>>>>> Common patterns like SCD Type 1 (maintaining a 1:1 replica) and SCD >>>>>> Type 2 (tracking full change history) often require hand-crafted, >>>>>> complex MERGE logic. In distributed systems, these implementations >>>>>> are frequently error-prone when handling deletions or out-of-order data. >>>>>> Proposal >>>>>> >>>>>> This SPIP proposes a new "Auto CDC" flow type for Spark. It >>>>>> encapsulates the complex logic for SCD types and out-of-order data, >>>>>> allowing data engineers to configure a declarative flow instead of >>>>>> writing >>>>>> manual MERGE statements. This feature will be available in both Python >>>>>> and SQL. >>>>>> >>>>>> Example SQL: >>>>>> >>>>>> -- Produce a change feed >>>>>> >>>>>> CREATE STREAMING TABLE cdc.users AS >>>>>> >>>>>> SELECT * FROM STREAM my_table CHANGES FROM VERSION 10; >>>>>> >>>>>> >>>>>> -- Consume the change feed >>>>>> >>>>>> CREATE FLOW flow >>>>>> >>>>>> AS AUTO CDC INTO >>>>>> >>>>>> target >>>>>> >>>>>> FROM stream(cdc_data.users) >>>>>> >>>>>> KEYS (userId) >>>>>> >>>>>> APPLY AS DELETE WHEN operation = "DELETE" >>>>>> >>>>>> SEQUENCE BY sequenceNum >>>>>> >>>>>> COLUMNS * EXCEPT (operation, sequenceNum) >>>>>> >>>>>> STORED AS SCD TYPE 2 >>>>>> >>>>>> TRACK HISTORY ON * EXCEPT (city); >>>>>> >>>>>> >>>>>> *Relevant Links:* >>>>>> >>>>>> - SPIP Document: >>>>>> >>>>>> https://docs.google.com/document/d/1Hp5BGEYJRHbk6J7XUph3bAPZKRQXKOuV1PEaqZMMRoQ/ >>>>>> - >>>>>> >>>>>> *Discussion Thread: * >>>>>> https://lists.apache.org/thread/j6sj9wo9odgdpgzlxtvhoy7szs0jplf7 >>>>>> - >>>>>> >>>>>> JIRA: <https://issues.apache.org/jira/browse/SPARK-55668> >>>>>> https://issues.apache.org/jira/browse/SPARK-56249 >>>>>> >>>>>> *The vote will be open for at least 72 hours. *Please vote: >>>>>> >>>>>> [ ] +1: Accept the proposal as an official SPIP >>>>>> [ ] +0 >>>>>> [ ] -1: I don't think this is a good idea because ... >>>>>> Cheers -Andreas >>>>>> >>>>>> >>>>>>
