[ 
https://issues.apache.org/jira/browse/SPARK-56942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-56942:
-----------------------------------
    Labels: pull-request-available  (was: )

> Support nested column references as DSv2 row IDs
> ------------------------------------------------
>
>                 Key: SPARK-56942
>                 URL: https://issues.apache.org/jira/browse/SPARK-56942
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 4.2.0
>            Reporter: Pengfei Xu
>            Priority: Major
>              Labels: pull-request-available
>
> Connectors that implement `SupportsDelta` declare row identifiers via 
> `rowId()`, which returns `NamedReference[]`. A `NamedReference` may be 
> multi-segment (e.g. `["data", "pk"]` or `["_metadata", "row_index"]`), so the 
> API contract permits nested row IDs.
> During analysis, however, Spark calls 
> `V2ExpressionUtils.resolveRefs[AttributeReference](operation.rowId, 
> relation)` from both `RewriteRowLevelCommand.resolveRowIdAttrs` and 
> `WriteDelta.rowIdAttrsResolved`. For a multi-segment reference, the resolver 
> returns `Alias(GetStructField(...))` and the 
> `asInstanceOf[AttributeReference]` cast throws `ClassCastException` before 
> any plan executes. DELETE / UPDATE / MERGE against such a connector fails 
> outright.
> Widen the resolver call to `resolveRefs[NamedExpression]` and flatten back 
> via `.toAttribute`. Both flat and nested row-id columns then work; 
> flat-column behavior is unchanged.
> This unblocks DSv2 connectors that identify rows by file-source metadata such 
> as `(_metadata.file_path, _metadata.row_index)` -- the natural identity for 
> position-delete / deletion-vector writes. Iceberg's DSv1 
> `SparkPositionDeltaOperation` uses an analogous `[_file, _pos]` pattern; this 
> lets DSv2 connectors follow suit.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to