[
https://issues.apache.org/jira/browse/SPARK-56711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Hyukjin Kwon resolved SPARK-56711.
----------------------------------
Fix Version/s: 4.2.0
5.0.0
Resolution: Fixed
Issue resolved by pull request 55663
[https://github.com/apache/spark/pull/55663]
> CDC: Restricting data type of _commit_version to Long / String
> --------------------------------------------------------------
>
> Key: SPARK-56711
> URL: https://issues.apache.org/jira/browse/SPARK-56711
> Project: Spark
> Issue Type: Sub-task
> Components: SQL
> Affects Versions: 4.2.0
> Reporter: Gengliang Wang
> Assignee: Gengliang Wang
> Priority: Major
> Labels: pull-request-available
> Fix For: 4.2.0, 5.0.0
>
>
> 1. Every realistic CDC source falls into one of two camps — a monotonic
> numeric version (Delta, Iceberg, LSN, Kafka offset → Long) or an opaque
> identifier (commit hashes,composite IDs → String). The other allowed types
> are either strict subsets (Integer⊂ Long) or duplicate the role of
> _commit_timestamp (Timestamp), and Float/Double/Decimal/Boolean/Binary just
> add NaN/precision/ordering foot-guns with no expressive power gained.
>
>
> 2. A narrower contract is a one-way door we can always relax later. Going
> from "Long+String" to "any AtomicType" is non-breaking; the reverse breaks
> connectors. Locking down now while there are zero external connectors costs
> nothing.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]