+1! On Tue, Mar 3, 2026 at 5:48 PM Szehon Ho <[email protected]> wrote:
> +1, look forward to it (non binding) > > Thanks > Szehon > > On Tue, Mar 3, 2026 at 5:37 PM Anton Okolnychyi <[email protected]> > wrote: > >> +1 (non-binding) >> >> On Tue, Mar 3, 2026 at 5:07 PM Mich Talebzadeh <[email protected]> >> wrote: >> >>> +1 >>> >>> Dr Mich Talebzadeh, >>> Data Scientist | Distributed Systems (Spark) | Financial Forensics & >>> Metadata Analytics | Transaction Reconstruction | Audit & Evidence-Based >>> Analytics >>> >>> view my Linkedin profile >>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> >>> >>> >>> >>> >>> >>> On Wed, 4 Mar 2026 at 00:57, Gengliang Wang <[email protected]> wrote: >>> >>>> Hi Spark devs, >>>> >>>> I'd like to call a vote on the SPIP*: Change Data Capture (CDC) >>>> Support* >>>> >>>> *Summary:* >>>> >>>> This SPIP proposes a unified approach by adding a CHANGES SQL clause >>>> and corresponding DataFrame/DataStream APIs that work across DSv2 >>>> connectors. >>>> >>>> 1. Standardized User API >>>> >>>> SQL: >>>> >>>> -- Batch: What changed between version 10 and 20? >>>> >>>> SELECT * FROM my_table CHANGES FROM VERSION 10 TO VERSION 20; >>>> >>>> -- Streaming: Continuously process changes >>>> >>>> CREATE STREAMING TABLE cdc_sink AS >>>> >>>> SELECT * FROM STREAM my_table CHANGES FROM VERSION 0; >>>> >>>> DataFrame API: >>>> >>>> spark.read >>>> >>>> .option("startingVersion", "10") >>>> >>>> .option("endingVersion", "20") >>>> >>>> .changes("my_table") >>>> >>>> 2. Engine-Level Post Processing Under the hood, this proposal >>>> introduces a minimal Changelog interface for DSv2 connectors. Spark's >>>> Catalyst optimizer will take over the CDC post-processing, including: >>>> >>>> - >>>> >>>> Filtering out copy-on-write carry-over rows. >>>> - >>>> >>>> Deriving pre-image/post-image updates from raw insert/delete pairs. >>>> - >>>> >>>> Computing net changes. >>>> >>>> >>>> *Relevant Links:* >>>> >>>> - *SPIP Doc: * >>>> >>>> https://docs.google.com/document/d/1-4rCS3vsGIyhwnkAwPsEaqyUDg-AuVkdrYLotFPw0U0/edit?usp=sharing >>>> - *Discuss Thread: * >>>> https://lists.apache.org/thread/dhxx6pohs7fvqc3knzhtoj4tbcgrwxts >>>> - *JIRA: *https://issues.apache.org/jira/browse/SPARK-55668 >>>> >>>> >>>> *The vote will be open for at least 72 hours. *Please vote: >>>> >>>> [ ] +1: Accept the proposal as an official SPIP >>>> >>>> [ ] +0 >>>> >>>> [ ] -1: I don't think this is a good idea because ... >>>> >>>> Thanks, >>>> Gengliang Wang >>>> >>>
