+1

Qiegang

On Tue, Mar 3, 2026 at 7:57 PM Gengliang Wang <[email protected]> wrote:

> Hi Spark devs,
>
> I'd like to call a vote on the SPIP*: Change Data Capture (CDC) Support*
>
> *Summary:*
>
> This SPIP proposes a unified approach by adding a CHANGES SQL clause and
> corresponding DataFrame/DataStream APIs that work across DSv2 connectors.
>
> 1. Standardized User API
>
> SQL:
>
> -- Batch: What changed between version 10 and 20?
>
> SELECT * FROM my_table CHANGES FROM VERSION 10 TO VERSION 20;
>
> -- Streaming: Continuously process changes
>
> CREATE STREAMING TABLE cdc_sink AS
>
> SELECT * FROM STREAM my_table CHANGES FROM VERSION 0;
>
> DataFrame API:
>
> spark.read
>
>   .option("startingVersion", "10")
>
>   .option("endingVersion", "20")
>
>   .changes("my_table")
>
> 2. Engine-Level Post Processing Under the hood, this proposal introduces
> a minimal Changelog interface for DSv2 connectors. Spark's Catalyst
> optimizer will take over the CDC post-processing, including:
>
>    -
>
>    Filtering out copy-on-write carry-over rows.
>    -
>
>    Deriving pre-image/post-image updates from raw insert/delete pairs.
>    -
>
>    Computing net changes.
>
>
> *Relevant Links:*
>
>    - *SPIP Doc: *
>    
> https://docs.google.com/document/d/1-4rCS3vsGIyhwnkAwPsEaqyUDg-AuVkdrYLotFPw0U0/edit?usp=sharing
>    - *Discuss Thread: *
>    https://lists.apache.org/thread/dhxx6pohs7fvqc3knzhtoj4tbcgrwxts
>    - *JIRA: *https://issues.apache.org/jira/browse/SPARK-55668
>
>
> *The vote will be open for at least 72 hours. *Please vote:
>
> [ ] +1: Accept the proposal as an official SPIP
>
> [ ] +0
>
> [ ] -1: I don't think this is a good idea because ...
>
> Thanks,
> Gengliang Wang
>

Reply via email to