+1 vaquar khan <[email protected]> 于2026年4月4日周六 09:45写道:
> +1 > > Regards, > Viquar Khan > > On Sat, 4 Apr 2026 at 11:14, Lisa N. Cao <[email protected]> wrote: > >> +1 (non-binding) >> >> -- >> LNC >> >> On Fri, Apr 3, 2026, 5:15 PM Shixiong Zhu <[email protected]> wrote: >> >>> +1 >>> >>> >>> On Fri, Apr 3, 2026 at 5:03 PM Mich Talebzadeh < >>> [email protected]> wrote: >>> >>>> +1 >>>> >>>> Dr Mich Talebzadeh, >>>> Data Scientist | Distributed Systems (Spark) | Financial Forensics & >>>> Metadata Analytics | Transaction Reconstruction | Audit & Evidence-Based >>>> Analytics >>>> >>>> view my Linkedin profile >>>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> >>>> >>>> >>>> >>>> >>>> >>>> On Fri, 3 Apr 2026 at 23:00, Andreas Neumann <[email protected]> wrote: >>>> >>>>> Hi Spark devs, >>>>> >>>>> I'd like to call a vote on the SPIP*: Auto CDC Support for Apache >>>>> Spark* >>>>> Motivation >>>>> >>>>> With the upcoming introduction of standardized CDC support >>>>> <https://issues.apache.org/jira/browse/SPARK-55668>, Spark will soon >>>>> have a unified way to produce change data feeds. However, consuming these >>>>> feeds and applying them to a target table remains a significant challenge. >>>>> >>>>> Common patterns like SCD Type 1 (maintaining a 1:1 replica) and SCD >>>>> Type 2 (tracking full change history) often require hand-crafted, >>>>> complex MERGE logic. In distributed systems, these implementations >>>>> are frequently error-prone when handling deletions or out-of-order data. >>>>> Proposal >>>>> >>>>> This SPIP proposes a new "Auto CDC" flow type for Spark. It >>>>> encapsulates the complex logic for SCD types and out-of-order data, >>>>> allowing data engineers to configure a declarative flow instead of writing >>>>> manual MERGE statements. This feature will be available in both Python >>>>> and SQL. >>>>> >>>>> Example SQL: >>>>> >>>>> -- Produce a change feed >>>>> >>>>> CREATE STREAMING TABLE cdc.users AS >>>>> >>>>> SELECT * FROM STREAM my_table CHANGES FROM VERSION 10; >>>>> >>>>> >>>>> -- Consume the change feed >>>>> >>>>> CREATE FLOW flow >>>>> >>>>> AS AUTO CDC INTO >>>>> >>>>> target >>>>> >>>>> FROM stream(cdc_data.users) >>>>> >>>>> KEYS (userId) >>>>> >>>>> APPLY AS DELETE WHEN operation = "DELETE" >>>>> >>>>> SEQUENCE BY sequenceNum >>>>> >>>>> COLUMNS * EXCEPT (operation, sequenceNum) >>>>> >>>>> STORED AS SCD TYPE 2 >>>>> >>>>> TRACK HISTORY ON * EXCEPT (city); >>>>> >>>>> >>>>> *Relevant Links:* >>>>> >>>>> - SPIP Document: >>>>> >>>>> https://docs.google.com/document/d/1Hp5BGEYJRHbk6J7XUph3bAPZKRQXKOuV1PEaqZMMRoQ/ >>>>> - >>>>> >>>>> *Discussion Thread: * >>>>> https://lists.apache.org/thread/j6sj9wo9odgdpgzlxtvhoy7szs0jplf7 >>>>> - >>>>> >>>>> JIRA: <https://issues.apache.org/jira/browse/SPARK-55668> >>>>> https://issues.apache.org/jira/browse/SPARK-56249 >>>>> >>>>> *The vote will be open for at least 72 hours. *Please vote: >>>>> >>>>> [ ] +1: Accept the proposal as an official SPIP >>>>> [ ] +0 >>>>> [ ] -1: I don't think this is a good idea because ... >>>>> Cheers -Andreas >>>>> >>>>> >>>>>
