This is an automated email from the ASF dual-hosted git repository.
cloud-fan pushed a commit to branch branch-4.2
in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/branch-4.2 by this push:
new 7ee145d2c569 [SPARK-46625][SQL][FOLLOWUP] Resolve identifier
expression in InsertIntoStatement/V2WriteCommand table slot
7ee145d2c569 is described below
commit 7ee145d2c5691fc5cdeac0bf28de4da2daf9d0f4
Author: haoyangeng-db <[email protected]>
AuthorDate: Fri May 22 16:39:12 2026 +0800
[SPARK-46625][SQL][FOLLOWUP] Resolve identifier expression in
InsertIntoStatement/V2WriteCommand table slot
### What changes were proposed in this pull request?
This is a follow-up to SPARK-46625 (PR #55949 - "Place IDENTIFIER
placeholder in command name slot").
SPARK-46625 moved `PlanWithUnresolvedIdentifier` from wrapping the whole
command into the command's identifier slot at parse time. For
`InsertIntoStatement` and `V2WriteCommand` the placeholder now lives in
`.table`, which is a non-child `LogicalPlan` slot (`override def child =
query`). That PR correctly added explicit recursion for that slot in
`BindParameters` (parameter binding) and `ResolveIdentifierClause`
(materializing the placeholder once `identifierExpr` is resolved), but th [...]
This PR adds two cases at the top of `ResolveReferences.doApply` that
mirror the existing pattern: when `InsertIntoStatement.table` or
`V2WriteCommand.table` is an unresolved `PlanWithUnresolvedIdentifier`, resolve
`identifierExpr` via `resolveExpressionByPlanChildren(..., includeLastResort =
true)` (which runs the resolveColsLastResort` path: `resolveVariables compose
resolveOuterRef`). The `!identifierExpr.resolved` guard keeps the cases
idempotent under bottom-up traversal.
### Why are the changes needed?
Without this, `INSERT INTO IDENTIFIER(<sql-variable>) ...` fails analysis:
the `UnresolvedAttribute` for the variable name sitting inside
`PlanWithUnresolvedIdentifier.identifierExpr` is never rewritten to a
`VariableReference`. Since `ResolveIdentifierClause` only fires when
`identifierExpr.resolved && childrenResolved`, the placeholder never
materializes; the plan reaches `PreprocessTableInsertion` with an unresolved
attribute and errors out (e.g. `UNSUPPORTED_INSERT.RDD_BASED`).
Repro on master before this fix:
```sql
CREATE TABLE t (a INT) USING PARQUET;
DECLARE OR REPLACE VARIABLE target_table STRING;
SET VAR target_table = 't';
INSERT INTO IDENTIFIER(target_table) SELECT 42 AS a;
```
The same shape applies to `OverwriteByExpression.table` (e.g. `REPLACE
WHERE`, `REPLACE ON`, `REPLACE USING` variants of INSERT) - fixed by the same
`V2WriteCommand` case.
### Does this PR introduce _any_ user-facing change?
No. Bug-fix only.
### How was this patch tested?
New test added.
### Was this patch authored or co-authored using generative AI tooling?
Co-authored with Claude Code.
Closes #56024 from haoyangeng-db/spark-46625-followup-resolve-identifier.
Authored-by: haoyangeng-db <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
(cherry picked from commit 96f43d99851586052373cbf7e0eeefef4f30f70d)
Signed-off-by: Wenchen Fan <[email protected]>
---
.../spark/sql/catalyst/analysis/Analyzer.scala | 31 ++++++++++++++++
.../org/apache/spark/sql/ParametersSuite.scala | 42 ++++++++++++++++++++++
2 files changed, 73 insertions(+)
diff --git
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
index da2e57c0a649..95a856ac28e6 100644
---
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
+++
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
@@ -1540,6 +1540,37 @@ class Analyzer(
}
def doApply(plan: LogicalPlan): LogicalPlan = plan.resolveOperatorsUp {
+ // `InsertIntoStatement.table` and `V2WriteCommand.table` are non-child
`LogicalPlan`
+ // slots (`child = query`), so the default `resolveOperatorsUp` +
`mapExpressions`
+ // traversal never resolves expressions placed inside them. For a
+ // `PlanWithUnresolvedIdentifier`, `identifierExpr` (e.g. an
`UnresolvedAttribute`
+ // referring to a SQL variable in `INSERT INTO IDENTIFIER(target_table)
...`) must
+ // be resolved here before `ResolveIdentifierClause` can materialize the
relation.
+ // Mirror the structural recursion into the non-child `.table` slot that
+ // `BindParameters` and `ResolveIdentifierClause` already do for the
same shape
+ // (SPARK-46625); unlike those rules, this one performs attribute
resolution rather
+ // than parameter binding or placeholder materialization. Resolve
against `p` (whose
+ // `children` are `Nil` on the INSERT / `OverwriteByExpression` path
built by
+ // `buildWriteTableSlot`) so the IDENTIFIER expression cannot see query
output
+ // columns -- only the last-resort variable resolution path fires. The
+ // `!identifierExpr.resolved` guard makes the case idempotent under
bottom-up
+ // traversal.
+ case i: InsertIntoStatement
+ if i.table.isInstanceOf[PlanWithUnresolvedIdentifier] &&
+
!i.table.asInstanceOf[PlanWithUnresolvedIdentifier].identifierExpr.resolved =>
+ val p = i.table.asInstanceOf[PlanWithUnresolvedIdentifier]
+ val resolvedExpr = resolveExpressionByPlanChildren(
+ p.identifierExpr, p, includeLastResort = true)
+ i.copy(table = p.copy(identifierExpr = resolvedExpr))
+
+ case w: V2WriteCommand
+ if w.table.isInstanceOf[PlanWithUnresolvedIdentifier] &&
+
!w.table.asInstanceOf[PlanWithUnresolvedIdentifier].identifierExpr.resolved =>
+ val p = w.table.asInstanceOf[PlanWithUnresolvedIdentifier]
+ val resolvedExpr = resolveExpressionByPlanChildren(
+ p.identifierExpr, p, includeLastResort = true)
+ w.withNewTable(p.copy(identifierExpr = resolvedExpr))
+
// Don't wait other rules to resolve the child plans of
`InsertIntoStatement` as we need
// to resolve column "DEFAULT" in the child plans so that they must be
unresolved.
case i: InsertIntoStatement => resolveColumnDefaultInCommandInputQuery(i)
diff --git a/sql/core/src/test/scala/org/apache/spark/sql/ParametersSuite.scala
b/sql/core/src/test/scala/org/apache/spark/sql/ParametersSuite.scala
index 575fcc058169..ca7732772b58 100644
--- a/sql/core/src/test/scala/org/apache/spark/sql/ParametersSuite.scala
+++ b/sql/core/src/test/scala/org/apache/spark/sql/ParametersSuite.scala
@@ -2586,6 +2586,48 @@ class ParametersSuite extends SharedSparkSession {
s"Expected :tname inside OverwriteByExpression.table to be bound,
got:\n$boundOverwrite")
}
+ // SPARK-46625 followup: `INSERT INTO IDENTIFIER(<sql-variable>) ...` places
a
+ // `PlanWithUnresolvedIdentifier` in `InsertIntoStatement.table`, whose
`identifierExpr`
+ // holds an `UnresolvedAttribute` for the variable name. That slot is a
non-child
+ // `LogicalPlan`, so the default `ResolveReferences` traversal never
resolves the
+ // attribute, `ResolveIdentifierClause` cannot fire (it waits on
`identifierExpr.resolved`),
+ // and analysis fails. Verify that the explicit `InsertIntoStatement` case
added to
+ // `ResolveReferences` rewrites the attribute to a `VariableReference` and
the insert
+ // completes end-to-end.
+ test("SPARK-46625: INSERT INTO IDENTIFIER(<sql-variable>) resolves variable
in table slot") {
+ withTable("t_var_insert") {
+ sql("CREATE TABLE t_var_insert (a INT) USING PARQUET")
+ sql("DECLARE OR REPLACE VARIABLE target_table STRING")
+ try {
+ sql("SET VAR target_table = 't_var_insert'")
+ sql("INSERT INTO IDENTIFIER(target_table) SELECT 42 AS a")
+ checkAnswer(spark.table("t_var_insert"), Row(42))
+ } finally {
+ sql("DROP TEMPORARY VARIABLE IF EXISTS target_table")
+ }
+ }
+ }
+
+ // SPARK-46625 followup: when the SQL variable name in `IDENTIFIER(<name>)`
collides
+ // with a query output column, the IDENTIFIER expression must still bind to
the
+ // variable, not to the column. The `ResolveReferences` case for
`InsertIntoStatement`
+ // resolves `identifierExpr` against the `PlanWithUnresolvedIdentifier`
itself (whose
+ // `children` are `Nil` on this path), not against the surrounding
`InsertIntoStatement`
+ // (whose child is `query`), so query output columns are out of scope and
only the
+ // last-resort variable resolution path fires.
+ test("SPARK-46625: INSERT INTO IDENTIFIER(<sql-variable>) ignores colliding
query columns") {
+ withTable("t_shadow") {
+ sql("CREATE TABLE t_shadow (a INT) USING PARQUET")
+ sql("DECLARE OR REPLACE VARIABLE a STRING DEFAULT 't_shadow'")
+ try {
+ sql("INSERT INTO IDENTIFIER(a) SELECT 42 AS a")
+ checkAnswer(spark.table("t_shadow"), Row(42))
+ } finally {
+ sql("DROP TEMPORARY VARIABLE IF EXISTS a")
+ }
+ }
+ }
+
// SPARK-46625: `CacheTableAsSelect.tempViewName` is an `Expression` slot,
so an
// `IDENTIFIER(<non-literal>)` produces an
`ExpressionWithUnresolvedIdentifier` there instead of
// wrapping the entire command in a `PlanWithUnresolvedIdentifier`. Verify
on the parsed plan
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]