(spark) branch branch-4.2 updated: [SPARK-46625][SQL][FOLLOWUP] Resolve identifier expression in InsertIntoStatement/V2WriteCommand table slot

wenchen Fri, 22 May 2026 01:40:04 -0700

This is an automated email from the ASF dual-hosted git repository.

cloud-fan pushed a commit to branch branch-4.2
in repository https://gitbox.apache.org/repos/asf/spark.git



The following commit(s) were added to refs/heads/branch-4.2 by this push:
     new 7ee145d2c569 [SPARK-46625][SQL][FOLLOWUP] Resolve identifier 
expression in InsertIntoStatement/V2WriteCommand table slot
7ee145d2c569 is described below

commit 7ee145d2c5691fc5cdeac0bf28de4da2daf9d0f4
Author: haoyangeng-db <[email protected]>
AuthorDate: Fri May 22 16:39:12 2026 +0800

    [SPARK-46625][SQL][FOLLOWUP] Resolve identifier expression in 
InsertIntoStatement/V2WriteCommand table slot
    
    ### What changes were proposed in this pull request?
    
    This is a follow-up to SPARK-46625 (PR #55949 - "Place IDENTIFIER 
placeholder in command name slot").
    
    SPARK-46625 moved `PlanWithUnresolvedIdentifier` from wrapping the whole 
command into the command's identifier slot at parse time. For 
`InsertIntoStatement` and `V2WriteCommand` the placeholder now lives in 
`.table`, which is a non-child `LogicalPlan` slot (`override def child = 
query`). That PR correctly added explicit recursion for that slot in 
`BindParameters` (parameter binding) and `ResolveIdentifierClause` 
(materializing the placeholder once `identifierExpr` is resolved), but th [...]
    
    This PR adds two cases at the top of `ResolveReferences.doApply` that 
mirror the existing pattern: when `InsertIntoStatement.table` or 
`V2WriteCommand.table` is an unresolved `PlanWithUnresolvedIdentifier`, resolve 
`identifierExpr` via `resolveExpressionByPlanChildren(..., includeLastResort = 
true)` (which runs the resolveColsLastResort` path: `resolveVariables compose 
resolveOuterRef`). The `!identifierExpr.resolved` guard keeps the cases 
idempotent under bottom-up traversal.
    
    ### Why are the changes needed?
    
    Without this, `INSERT INTO IDENTIFIER(<sql-variable>) ...` fails analysis: 
the `UnresolvedAttribute` for the variable name sitting inside 
`PlanWithUnresolvedIdentifier.identifierExpr` is never rewritten to a 
`VariableReference`. Since `ResolveIdentifierClause` only fires when 
`identifierExpr.resolved && childrenResolved`, the placeholder never 
materializes; the plan reaches `PreprocessTableInsertion` with an unresolved 
attribute and errors out (e.g. `UNSUPPORTED_INSERT.RDD_BASED`).
    
    Repro on master before this fix:
    
    ```sql
    CREATE TABLE t (a INT) USING PARQUET;
    DECLARE OR REPLACE VARIABLE target_table STRING;
    SET VAR target_table = 't';
    INSERT INTO IDENTIFIER(target_table) SELECT 42 AS a;
    ```
    
    The same shape applies to `OverwriteByExpression.table` (e.g. `REPLACE 
WHERE`, `REPLACE ON`, `REPLACE USING` variants of INSERT) - fixed by the same 
`V2WriteCommand` case.
    
    ### Does this PR introduce _any_ user-facing change?
    
    No.  Bug-fix only.
    
    ### How was this patch tested?
    
    New test added.
    
    ### Was this patch authored or co-authored using generative AI tooling?
    
    Co-authored with Claude Code.
    
    Closes #56024 from haoyangeng-db/spark-46625-followup-resolve-identifier.
    
    Authored-by: haoyangeng-db <[email protected]>
    Signed-off-by: Wenchen Fan <[email protected]>
    (cherry picked from commit 96f43d99851586052373cbf7e0eeefef4f30f70d)
    Signed-off-by: Wenchen Fan <[email protected]>
---
 .../spark/sql/catalyst/analysis/Analyzer.scala     | 31 ++++++++++++++++
 .../org/apache/spark/sql/ParametersSuite.scala     | 42 ++++++++++++++++++++++
 2 files changed, 73 insertions(+)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
index da2e57c0a649..95a856ac28e6 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
@@ -1540,6 +1540,37 @@ class Analyzer(
     }
 
     def doApply(plan: LogicalPlan): LogicalPlan = plan.resolveOperatorsUp {
+      // `InsertIntoStatement.table` and `V2WriteCommand.table` are non-child 
`LogicalPlan`
+      // slots (`child = query`), so the default `resolveOperatorsUp` + 
`mapExpressions`
+      // traversal never resolves expressions placed inside them. For a
+      // `PlanWithUnresolvedIdentifier`, `identifierExpr` (e.g. an 
`UnresolvedAttribute`
+      // referring to a SQL variable in `INSERT INTO IDENTIFIER(target_table) 
...`) must
+      // be resolved here before `ResolveIdentifierClause` can materialize the 
relation.
+      // Mirror the structural recursion into the non-child `.table` slot that
+      // `BindParameters` and `ResolveIdentifierClause` already do for the 
same shape
+      // (SPARK-46625); unlike those rules, this one performs attribute 
resolution rather
+      // than parameter binding or placeholder materialization. Resolve 
against `p` (whose
+      // `children` are `Nil` on the INSERT / `OverwriteByExpression` path 
built by
+      // `buildWriteTableSlot`) so the IDENTIFIER expression cannot see query 
output
+      // columns -- only the last-resort variable resolution path fires. The
+      // `!identifierExpr.resolved` guard makes the case idempotent under 
bottom-up
+      // traversal.
+      case i: InsertIntoStatement
+          if i.table.isInstanceOf[PlanWithUnresolvedIdentifier] &&
+             
!i.table.asInstanceOf[PlanWithUnresolvedIdentifier].identifierExpr.resolved =>
+        val p = i.table.asInstanceOf[PlanWithUnresolvedIdentifier]
+        val resolvedExpr = resolveExpressionByPlanChildren(
+          p.identifierExpr, p, includeLastResort = true)
+        i.copy(table = p.copy(identifierExpr = resolvedExpr))
+
+      case w: V2WriteCommand
+          if w.table.isInstanceOf[PlanWithUnresolvedIdentifier] &&
+             
!w.table.asInstanceOf[PlanWithUnresolvedIdentifier].identifierExpr.resolved =>
+        val p = w.table.asInstanceOf[PlanWithUnresolvedIdentifier]
+        val resolvedExpr = resolveExpressionByPlanChildren(
+          p.identifierExpr, p, includeLastResort = true)
+        w.withNewTable(p.copy(identifierExpr = resolvedExpr))
+
       // Don't wait other rules to resolve the child plans of 
`InsertIntoStatement` as we need
       // to resolve column "DEFAULT" in the child plans so that they must be 
unresolved.
       case i: InsertIntoStatement => resolveColumnDefaultInCommandInputQuery(i)
diff --git a/sql/core/src/test/scala/org/apache/spark/sql/ParametersSuite.scala 
b/sql/core/src/test/scala/org/apache/spark/sql/ParametersSuite.scala
index 575fcc058169..ca7732772b58 100644
--- a/sql/core/src/test/scala/org/apache/spark/sql/ParametersSuite.scala
+++ b/sql/core/src/test/scala/org/apache/spark/sql/ParametersSuite.scala
@@ -2586,6 +2586,48 @@ class ParametersSuite extends SharedSparkSession {
       s"Expected :tname inside OverwriteByExpression.table to be bound, 
got:\n$boundOverwrite")
   }
 
+  // SPARK-46625 followup: `INSERT INTO IDENTIFIER(<sql-variable>) ...` places 
a
+  // `PlanWithUnresolvedIdentifier` in `InsertIntoStatement.table`, whose 
`identifierExpr`
+  // holds an `UnresolvedAttribute` for the variable name. That slot is a 
non-child
+  // `LogicalPlan`, so the default `ResolveReferences` traversal never 
resolves the
+  // attribute, `ResolveIdentifierClause` cannot fire (it waits on 
`identifierExpr.resolved`),
+  // and analysis fails. Verify that the explicit `InsertIntoStatement` case 
added to
+  // `ResolveReferences` rewrites the attribute to a `VariableReference` and 
the insert
+  // completes end-to-end.
+  test("SPARK-46625: INSERT INTO IDENTIFIER(<sql-variable>) resolves variable 
in table slot") {
+    withTable("t_var_insert") {
+      sql("CREATE TABLE t_var_insert (a INT) USING PARQUET")
+      sql("DECLARE OR REPLACE VARIABLE target_table STRING")
+      try {
+        sql("SET VAR target_table = 't_var_insert'")
+        sql("INSERT INTO IDENTIFIER(target_table) SELECT 42 AS a")
+        checkAnswer(spark.table("t_var_insert"), Row(42))
+      } finally {
+        sql("DROP TEMPORARY VARIABLE IF EXISTS target_table")
+      }
+    }
+  }
+
+  // SPARK-46625 followup: when the SQL variable name in `IDENTIFIER(<name>)` 
collides
+  // with a query output column, the IDENTIFIER expression must still bind to 
the
+  // variable, not to the column. The `ResolveReferences` case for 
`InsertIntoStatement`
+  // resolves `identifierExpr` against the `PlanWithUnresolvedIdentifier` 
itself (whose
+  // `children` are `Nil` on this path), not against the surrounding 
`InsertIntoStatement`
+  // (whose child is `query`), so query output columns are out of scope and 
only the
+  // last-resort variable resolution path fires.
+  test("SPARK-46625: INSERT INTO IDENTIFIER(<sql-variable>) ignores colliding 
query columns") {
+    withTable("t_shadow") {
+      sql("CREATE TABLE t_shadow (a INT) USING PARQUET")
+      sql("DECLARE OR REPLACE VARIABLE a STRING DEFAULT 't_shadow'")
+      try {
+        sql("INSERT INTO IDENTIFIER(a) SELECT 42 AS a")
+        checkAnswer(spark.table("t_shadow"), Row(42))
+      } finally {
+        sql("DROP TEMPORARY VARIABLE IF EXISTS a")
+      }
+    }
+  }
+
   // SPARK-46625: `CacheTableAsSelect.tempViewName` is an `Expression` slot, 
so an
   // `IDENTIFIER(<non-literal>)` produces an 
`ExpressionWithUnresolvedIdentifier` there instead of
   // wrapping the entire command in a `PlanWithUnresolvedIdentifier`. Verify 
on the parsed plan


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

(spark) branch branch-4.2 updated: [SPARK-46625][SQL][FOLLOWUP] Resolve identifier expression in InsertIntoStatement/V2WriteCommand table slot

Reply via email to