Gengliang Wang created SPARK-56711:
--------------------------------------

             Summary: CDC: Restricting data type of _commit_version to Long / 
String
                 Key: SPARK-56711
                 URL: https://issues.apache.org/jira/browse/SPARK-56711
             Project: Spark
          Issue Type: Sub-task
          Components: SQL
    Affects Versions: 4.2.0
            Reporter: Gengliang Wang
            Assignee: Gengliang Wang


1. Every realistic CDC source falls into one of two camps — a monotonic numeric 
version (Delta, Iceberg, LSN, Kafka offset → Long) or an  opaque identifier 
(commit hashes,composite IDs → String). The other allowed types are either 
strict subsets (Integer⊂ Long) or  duplicate the role of _commit_timestamp 
(Timestamp), and Float/Double/Decimal/Boolean/Binary just add 
NaN/precision/ordering foot-guns with no expressive power gained.               
                                                                                
         

  2. A narrower contract is a one-way door we can always relax later. Going 
from "Long+String" to "any AtomicType" is non-breaking; the reverse breaks 
connectors. Locking down now while there are zero external connectors costs 
nothing. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to