[ 
https://issues.apache.org/jira/browse/SPARK-56711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-56711.
----------------------------------
    Fix Version/s: 4.2.0
                   5.0.0
       Resolution: Fixed

Issue resolved by pull request 55663
[https://github.com/apache/spark/pull/55663]

> CDC: Restricting data type of _commit_version to Long / String
> --------------------------------------------------------------
>
>                 Key: SPARK-56711
>                 URL: https://issues.apache.org/jira/browse/SPARK-56711
>             Project: Spark
>          Issue Type: Sub-task
>          Components: SQL
>    Affects Versions: 4.2.0
>            Reporter: Gengliang Wang
>            Assignee: Gengliang Wang
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 4.2.0, 5.0.0
>
>
> 1. Every realistic CDC source falls into one of two camps — a monotonic 
> numeric version (Delta, Iceberg, LSN, Kafka offset → Long) or an  opaque 
> identifier (commit hashes,composite IDs → String). The other allowed types 
> are either strict subsets (Integer⊂ Long) or  duplicate the role of 
> _commit_timestamp (Timestamp), and Float/Double/Decimal/Boolean/Binary just 
> add NaN/precision/ordering foot-guns with no expressive power gained.         
>                                                                               
>                  
>   2. A narrower contract is a one-way door we can always relax later. Going 
> from "Long+String" to "any AtomicType" is non-breaking; the reverse breaks 
> connectors. Locking down now while there are zero external connectors costs 
> nothing. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to