Gengliang Wang created SPARK-56711:
--------------------------------------
Summary: CDC: Restricting data type of _commit_version to Long /
String
Key: SPARK-56711
URL: https://issues.apache.org/jira/browse/SPARK-56711
Project: Spark
Issue Type: Sub-task
Components: SQL
Affects Versions: 4.2.0
Reporter: Gengliang Wang
Assignee: Gengliang Wang
1. Every realistic CDC source falls into one of two camps — a monotonic numeric
version (Delta, Iceberg, LSN, Kafka offset → Long) or an opaque identifier
(commit hashes,composite IDs → String). The other allowed types are either
strict subsets (Integer⊂ Long) or duplicate the role of _commit_timestamp
(Timestamp), and Float/Double/Decimal/Boolean/Binary just add
NaN/precision/ordering foot-guns with no expressive power gained.
2. A narrower contract is a one-way door we can always relax later. Going
from "Long+String" to "any AtomicType" is non-breaking; the reverse breaks
connectors. Locking down now while there are zero external connectors costs
nothing.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]