[
https://issues.apache.org/jira/browse/SPARK-57681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Anurag Mantripragada updated SPARK-57681:
-----------------------------------------
Description:
SPARK-36680 added the WITH (...) options clause for SELECT ([PR
#46707|https://github.com/apache/spark/pull/46707]), and SPARK-49098 extended
it to INSERT (PR [#47591|https://github.com/apache/spark/pull/47591]). However,
row-level DML commands (DELETE, UPDATE, MERGE) do not yet support this syntax.
A DataSource V2 connector such as Apache Iceberg needs per-statement options on
these commands to control behavior like copy-on-write vs merge-on-read,
delete-granularity, target-file-size-bytes, distribution-mode, isolation-level,
and branch selection.
This JIRA covers UPDATE only.
Proposed syntax (mirrors the existing SELECT/INSERT precedent):
{{UPDATE table WITH (`key` = 'value') SET ... WHERE ...}}
See the discussion on PR #46707:
- [https://github.com/apache/spark/pull/46707#issuecomment-2274055363]
- [https://github.com/apache/spark/pull/46707#issuecomment-2274312254]
was:
SPARK-36680 added the WITH (...) options clause for SELECT ([PR
#46707|https://github.com/apache/spark/pull/46707]), and SPARK-49098 extended
it to INSERT (PR [#47591|https://github.com/apache/spark/pull/47591]). However,
row-level DML commands (DELETE, UPDATE, MERGE) do not yet support this syntax.
A DataSource V2 connector such as Apache Iceberg needs per-statement options on
these commands to control behavior like copy-on-write vs merge-on-read,
delete-granularity, target-file-size-bytes, distribution-mode, isolation-level,
and branch selection. The DSv2 API already has the hooks
(RowLevelOperationInfo.options() and LogicalWriteInfo.options()), but they are
never populated for row-level commands because the rewrite rules hardcode
CaseInsensitiveStringMap.empty().
This JIRA covers DELETE and UPDATE. MERGE will be handled in a separate
follow-up.
Proposed syntax (mirrors the existing SELECT/INSERT precedent):
{{DELETE FROM table WITH (`key` = 'value') WHERE ...}}
{{UPDATE table WITH (`key` = 'value') SET ... WHERE ...}}
The options are surfaced as a single map and the connector disambiguates read
vs write keys internally, consistent with how RowLevelOperationInfo.options()
is designed
and how Iceberg's SparkReadConf/SparkWriteConf already work.
See the discussion on PR #46707:
- [https://github.com/apache/spark/pull/46707#issuecomment-2274055363]
- [https://github.com/apache/spark/pull/46707#issuecomment-2274312254]
> Support dynamic table options for UPDATE
> ----------------------------------------
>
> Key: SPARK-57681
> URL: https://issues.apache.org/jira/browse/SPARK-57681
> Project: Spark
> Issue Type: Improvement
> Components: SQL
> Affects Versions: 5.0.0
> Reporter: Anurag Mantripragada
> Priority: Major
> Labels: pull-request-available
>
> SPARK-36680 added the WITH (...) options clause for SELECT ([PR
> #46707|https://github.com/apache/spark/pull/46707]), and SPARK-49098 extended
> it to INSERT (PR [#47591|https://github.com/apache/spark/pull/47591]).
> However, row-level DML commands (DELETE, UPDATE, MERGE) do not yet support
> this syntax.
> A DataSource V2 connector such as Apache Iceberg needs per-statement options
> on these commands to control behavior like copy-on-write vs merge-on-read,
> delete-granularity, target-file-size-bytes, distribution-mode,
> isolation-level, and branch selection.
> This JIRA covers UPDATE only.
> Proposed syntax (mirrors the existing SELECT/INSERT precedent):
> {{UPDATE table WITH (`key` = 'value') SET ... WHERE ...}}
> See the discussion on PR #46707:
> - [https://github.com/apache/spark/pull/46707#issuecomment-2274055363]
> - [https://github.com/apache/spark/pull/46707#issuecomment-2274312254]
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]