Eric Marnadi created SPARK-55402:
------------------------------------
Summary: Move streamingSourceIdentifyingName from CatalogTable to
DataSource
Key: SPARK-55402
URL: https://issues.apache.org/jira/browse/SPARK-55402
Project: Spark
Issue Type: Task
Components: Structured Streaming
Affects Versions: 4.2.0
Reporter: Eric Marnadi
streamingSourceIdentifyingName represents query-specific metadata (which source
name was assigned in a particular streaming query plan), not an intrinsic
property of the table itself. Storing it in CatalogTable breaks table equality
semantics:
- Two references to the same table in a single query can have different
streamingSourceIdentifyingName values
- This causes them to compare as unequal via CatalogTable.equals()
- This can impact multi-statement transactions and any caching/deduplication
logic that relies on CatalogTable equality. By moving this field to DataSource
(which is already query-specific), we restore proper catalog table equality
while maintaining the ability to track streaming source identifying names for
stable checkpoints.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]