This is an automated email from the ASF dual-hosted git repository.

cloud-fan pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
     new 3922b08a171b [SPARK-57186][SQL] Handle NullType in ExtractValue to 
return NULL instead of throwing
3922b08a171b is described below

commit 3922b08a171bc2973d4a822c53ff5a0d3d185c75
Author: Dejan Krakovic <[email protected]>
AuthorDate: Tue Jun 2 23:55:16 2026 +0800

    [SPARK-57186][SQL] Handle NullType in ExtractValue to return NULL instead 
of throwing
    
    ### What changes were proposed in this pull request?
    
      Extracting a field/element/key from a NULL (`NullType`) base now returns 
NULL (SQL NULL propagation) instead of throwing 
`INVALID_EXTRACT_BASE_FIELD_TYPE`. A `NullType` column can arise, for example, 
from schema evolution with missing
      columns.
    
      The change is implemented at the user-facing resolution sites via a new 
helper, `ExtractValue.applyOrNull`, which returns `Literal(null, NullType)` 
when the base is `NullType` and otherwise delegates to `ExtractValue.apply`. 
The shared
      `ExtractValue.extractValue` is left **unchanged**, so unrelated consumers 
(`ResolveUnion`, higher-order functions, `PivotTransformer`, 
`VariableResolution`, deserializers) keep their prior (throwing) behavior.
    
      - **Multipart field access (`col.a`):** the 
`AttributeSeq.resolveCandidates` fold uses `applyOrNull`. 
`ExtractValue.isExtractable` now treats a `NullType` base as extractable, so 
the single-pass `NameScope` keeps the `NullType` candidate
      and reaches the same fold. As a result `col.a` returns NULL under 
**both** the legacy and single-pass analyzers (a dual-run golden test locks 
this in).
      - **Subscript access (`col[0]`, `col['key']`):** 
`ColumnResolutionHelper`'s `UnresolvedExtractValue` resolution uses 
`applyOrNull`, so these return NULL under the **legacy** analyzer. The result 
is aliased with the extraction's pretty SQL
      (e.g. `col[0]`) so the output column keeps a stable name instead of 
`NULL`. The single-pass resolver does not resolve subscript extraction 
(`UnresolvedExtractValue`) for **any** base type today (a pre-existing 
limitation —
      `ExtractValueResolver` is unwired and `ExpressionResolver` has no 
dispatch case), so these forms run under the legacy analyzer only.
    
      ### Why are the changes needed?
    
      `SELECT col.a` (or `col[0]` / `col['key']`) on a `NullType` column 
previously failed analysis with `INVALID_EXTRACT_BASE_FIELD_TYPE`. Under SQL 
NULL semantics, extracting from NULL should yield NULL rather than error.
    
      ### Does this PR introduce _any_ user-facing change?
    
      Yes. Extracting a struct field, array element, or map key from a 
`NullType` column now returns NULL instead of failing with 
`INVALID_EXTRACT_BASE_FIELD_TYPE`.
    
      ### How was this patch tested?
    
      Golden-file tests under `sql/core/src/test/resources/sql-tests/`:
      - `extract-value-resolution-edge-cases.sql` — `col.a`, `col[0]`, and 
`col['key']` on a `NullType` base all return NULL (legacy analyzer).
      - `extract-value-nulltype-single-pass.sql` — dual-runs `col.a` under 
`spark.sql.analyzer.singlePassResolver.dualRunWithLegacy=true` to lock the 
legacy/single-pass consistency (no `HYBRID_ANALYZER_EXCEPTION`).
      - `having-and-order-by-recursive-type-name-resolution.sql` — a `NullType` 
HAVING shadowing case, documented as the one base type that propagates NULL 
(empty result) rather than throwing like STRING/ARRAY/MAP.
    
      Also ran `AnalysisSuite` and the `extract` / `struct` golden suites; no 
existing golden regresses.
    
    ### Was this patch authored or co-authored using generative AI tooling?
    
    Yes, co-authored using Claude code.
    
    Closes #56237 from dejankrak-db/extract-value-nulltype-fix-oss.
    
    Lead-authored-by: Dejan Krakovic <[email protected]>
    Co-authored-by: Wenchen Fan <[email protected]>
    Signed-off-by: Wenchen Fan <[email protected]>
---
 .../catalyst/analysis/ColumnResolutionHelper.scala | 16 ++++++++-
 .../catalyst/analysis/TableOutputResolver.scala    |  5 +--
 .../expressions/complexTypeExtractors.scala        | 41 ++++++++++++++++++----
 .../spark/sql/catalyst/expressions/package.scala   |  4 ++-
 .../extract-value-nulltype-single-pass.sql.out     |  8 +++++
 .../extract-value-resolution-edge-cases.sql.out    | 27 ++++++++++++++
 ...order-by-recursive-type-name-resolution.sql.out |  9 +++++
 .../inputs/extract-value-nulltype-single-pass.sql  | 10 ++++++
 .../inputs/extract-value-resolution-edge-cases.sql | 10 ++++++
 ...and-order-by-recursive-type-name-resolution.sql |  7 ++++
 .../extract-value-nulltype-single-pass.sql.out     |  7 ++++
 .../extract-value-resolution-edge-cases.sql.out    | 24 +++++++++++++
 ...order-by-recursive-type-name-resolution.sql.out |  8 +++++
 13 files changed, 165 insertions(+), 11 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ColumnResolutionHelper.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ColumnResolutionHelper.scala
index eaa61a2a9163..55488e8eed95 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ColumnResolutionHelper.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ColumnResolutionHelper.scala
@@ -29,9 +29,11 @@ import 
org.apache.spark.sql.catalyst.expressions.SubExprUtils.wrapOuterReference
 import org.apache.spark.sql.catalyst.plans.logical._
 import org.apache.spark.sql.catalyst.trees.CurrentOrigin.withOrigin
 import org.apache.spark.sql.catalyst.trees.TreePattern._
+import org.apache.spark.sql.catalyst.util.toPrettySQL
 import org.apache.spark.sql.connector.catalog.CatalogManager
 import org.apache.spark.sql.errors.{DataTypeErrorsBase, QueryCompilationErrors}
 import org.apache.spark.sql.internal.SQLConf
+import org.apache.spark.sql.types.NullType
 
 trait ColumnResolutionHelper extends Logging with DataTypeErrorsBase {
 
@@ -193,7 +195,19 @@ trait ColumnResolutionHelper extends Logging with 
DataTypeErrorsBase {
             field
           }
           if (newChild.resolved) {
-            ExtractValue(child = newChild, extraction = resolvedField, 
resolver = resolver)
+            // applyOrNull propagates NULL when the base is NullType instead 
of throwing
+            // INVALID_EXTRACT_BASE_FIELD_TYPE, consistent with multipart 
field access (col.a).
+            val extracted = ExtractValue.applyOrNull(
+              child = newChild, extraction = resolvedField, resolver = 
resolver)
+            // A NullType base yields a bare NULL literal, which would 
otherwise produce an output
+            // column named `NULL`. Alias it with the extraction's text (e.g. 
`col[0]`) to keep a
+            // stable column name; CleanupAliases later trims this alias where 
it's not a top-level
+            // projection output.
+            if (newChild.dataType == NullType) {
+              Alias(extracted, toPrettySQL(u.copy(child = newChild, extraction 
= resolvedField)))()
+            } else {
+              extracted
+            }
           } else {
             u.copy(child = newChild, extraction = resolvedField)
           }
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TableOutputResolver.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TableOutputResolver.scala
index 93d53d9f33a6..aa379a63a2af 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TableOutputResolver.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TableOutputResolver.scala
@@ -290,8 +290,9 @@ object TableOutputResolver extends SQLConfHelper with 
Logging {
    * that exceed the column length are caught at runtime. Uses `getRawType` so 
it works for both
    * V1 and V2 tables. Shared by the by-name and by-position default-fill 
paths.
    *
-   * `applyColumnMetadata` strips the default's outer alias and re-wraps it 
with the required
-   * metadata, so the length check is applied to the default value itself (the 
alias child).
+   * We unwrap the default's outer alias before the length check so the check 
wraps the
+   * default value itself, not the alias; `applyColumnMetadata` then re-adds 
the required
+   * alias and metadata afterward.
    */
   private def applyDefaultWithLengthCheck(
       defaultExpr: Expression,
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeExtractors.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeExtractors.scala
index 022e130ec3a5..2f9c1e2dd704 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeExtractors.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeExtractors.scala
@@ -52,6 +52,25 @@ object ExtractValue {
     }
   }
 
+  /**
+   * Resolution-time variant of [[apply]]: extracting a field/element/key from 
a NULL (`NullType`)
+   * base yields NULL (SQL NULL propagation) instead of throwing 
`INVALID_EXTRACT_BASE_FIELD_TYPE`.
+   * A `NullType` column can arise e.g. from schema evolution with missing 
columns. This is used by
+   * the user-facing extraction resolution sites (multipart name resolution and
+   * `UnresolvedExtractValue` resolution). `extractValue` itself is left 
unchanged, so the other
+   * direct consumers keep their prior (throwing) behavior.
+   */
+  def applyOrNull(
+      child: Expression,
+      extraction: Expression,
+      resolver: Resolver): Expression = {
+    if (child.dataType == NullType) {
+      Literal(null, NullType)
+    } else {
+      apply(child, extraction, resolver)
+    }
+  }
+
   /**
    * Returns the resolved `ExtractValue`. It will return one kind of concrete 
`ExtractValue`,
    * depend on the type of `child` and `extraction`.
@@ -119,13 +138,21 @@ object ExtractValue {
     val withExtractedNestedFields = nestedFields
       .foldLeft(Some(attribute): Option[Expression]) {
         case (Some(expression), field) =>
-          ExtractValue.extractValue(
-            child = expression,
-            extraction = Literal(field),
-            resolver = resolver
-          ) match {
-            case Left(e) => Some(e)
-            case Right(_) => None
+          // Extraction from a NULL (NullType) base propagates NULL rather 
than failing, matching
+          // the user-facing resolution sites (which use applyOrNull). 
Treating it as extractable
+          // here keeps the NullType candidate in single-pass NameScope 
candidate filtering so it
+          // resolves consistently with the legacy analyzer.
+          if (expression.dataType == NullType) {
+            Some(Literal(null, NullType))
+          } else {
+            ExtractValue.extractValue(
+              child = expression,
+              extraction = Literal(field),
+              resolver = resolver
+            ) match {
+              case Left(e) => Some(e)
+              case Right(_) => None
+            }
           }
         case _ =>
           None
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/package.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/package.scala
index 114a43c34c04..a3008a949ec0 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/package.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/package.scala
@@ -396,7 +396,9 @@ package object expressions  {
           // Then this will add ExtractValue("c", ExtractValue("b", a)), and 
alias the final
           // expression as "c".
           val fieldExprs = nestedFields.foldLeft(a: Expression) { (e, name) =>
-            ExtractValue(e, Literal(name), resolver)
+            // applyOrNull propagates NULL when the base is NullType (e.g. a 
NullType column from
+            // schema evolution) instead of throwing 
INVALID_EXTRACT_BASE_FIELD_TYPE.
+            ExtractValue.applyOrNull(e, Literal(name), resolver)
           }
           Seq(Alias(fieldExprs, nestedFields.last)())
 
diff --git 
a/sql/core/src/test/resources/sql-tests/analyzer-results/extract-value-nulltype-single-pass.sql.out
 
b/sql/core/src/test/resources/sql-tests/analyzer-results/extract-value-nulltype-single-pass.sql.out
new file mode 100644
index 000000000000..3def19963fc0
--- /dev/null
+++ 
b/sql/core/src/test/resources/sql-tests/analyzer-results/extract-value-nulltype-single-pass.sql.out
@@ -0,0 +1,8 @@
+-- Automatically generated by SQLQueryTestSuite
+-- !query
+SELECT col.a FROM (SELECT null AS col) t
+-- !query analysis
+Project [null AS a#x]
++- SubqueryAlias t
+   +- Project [null AS col#x]
+      +- OneRowRelation
diff --git 
a/sql/core/src/test/resources/sql-tests/analyzer-results/extract-value-resolution-edge-cases.sql.out
 
b/sql/core/src/test/resources/sql-tests/analyzer-results/extract-value-resolution-edge-cases.sql.out
index 9f34e1a6e4ea..789f5bcb2362 100644
--- 
a/sql/core/src/test/resources/sql-tests/analyzer-results/extract-value-resolution-edge-cases.sql.out
+++ 
b/sql/core/src/test/resources/sql-tests/analyzer-results/extract-value-resolution-edge-cases.sql.out
@@ -38,3 +38,30 @@ DROP TABLE t1
 -- !query analysis
 DropTable false, false
 +- ResolvedIdentifier V2SessionCatalog(spark_catalog), default.t1
+
+
+-- !query
+SELECT col.a FROM (SELECT null AS col) t
+-- !query analysis
+Project [null AS a#x]
++- SubqueryAlias t
+   +- Project [null AS col#x]
+      +- OneRowRelation
+
+
+-- !query
+SELECT col[0] FROM (SELECT null AS col) t
+-- !query analysis
+Project [null AS col[0]#x]
++- SubqueryAlias t
+   +- Project [null AS col#x]
+      +- OneRowRelation
+
+
+-- !query
+SELECT col['key'] FROM (SELECT null AS col) t
+-- !query analysis
+Project [null AS col[key]#x]
++- SubqueryAlias t
+   +- Project [null AS col#x]
+      +- OneRowRelation
diff --git 
a/sql/core/src/test/resources/sql-tests/analyzer-results/having-and-order-by-recursive-type-name-resolution.sql.out
 
b/sql/core/src/test/resources/sql-tests/analyzer-results/having-and-order-by-recursive-type-name-resolution.sql.out
index cfcd9c5c42e2..11f5f948d8f6 100644
--- 
a/sql/core/src/test/resources/sql-tests/analyzer-results/having-and-order-by-recursive-type-name-resolution.sql.out
+++ 
b/sql/core/src/test/resources/sql-tests/analyzer-results/having-and-order-by-recursive-type-name-resolution.sql.out
@@ -500,3 +500,12 @@ Project [sum_val#x]
       +- Aggregate [col1#x], [(col1#x.nums[0] + col1#x.nums[1]) AS sum_val#x, 
col1#x]
          +- SubqueryAlias t
             +- LocalRelation [col1#x]
+
+
+-- !query
+SELECT NAMED_STRUCT('a', 1) AS col1 FROM VALUES (NULL) t (col1) GROUP BY col1 
HAVING col1.a == 1
+-- !query analysis
+Filter (cast(null as int) = 1)
++- Aggregate [col1#x], [named_struct(a, 1) AS col1#x]
+   +- SubqueryAlias t
+      +- LocalRelation [col1#x]
diff --git 
a/sql/core/src/test/resources/sql-tests/inputs/extract-value-nulltype-single-pass.sql
 
b/sql/core/src/test/resources/sql-tests/inputs/extract-value-nulltype-single-pass.sql
new file mode 100644
index 000000000000..19d2154936f9
--- /dev/null
+++ 
b/sql/core/src/test/resources/sql-tests/inputs/extract-value-nulltype-single-pass.sql
@@ -0,0 +1,10 @@
+-- SPARK-57186: multipart field access (col.a) on a NullType base propagates 
NULL under the
+-- single-pass resolver as well, consistently with the legacy analyzer. 
Dual-running both analyzers
+-- locks in that consistency (no HYBRID_ANALYZER_EXCEPTION).
+-- The col[0]/col['key'] subscript forms are intentionally not covered here: 
the single-pass
+-- resolver does not resolve subscript extraction (UnresolvedExtractValue) at 
all -- a pre-existing
+-- limitation independent of NullType -- so they are exercised only under the 
legacy analyzer in
+-- extract-value-resolution-edge-cases.sql.
+--SET spark.sql.analyzer.singlePassResolver.dualRunWithLegacy=true
+
+SELECT col.a FROM (SELECT null AS col) t;
diff --git 
a/sql/core/src/test/resources/sql-tests/inputs/extract-value-resolution-edge-cases.sql
 
b/sql/core/src/test/resources/sql-tests/inputs/extract-value-resolution-edge-cases.sql
index 5a2784d54270..48ebdfc0a3fa 100644
--- 
a/sql/core/src/test/resources/sql-tests/inputs/extract-value-resolution-edge-cases.sql
+++ 
b/sql/core/src/test/resources/sql-tests/inputs/extract-value-resolution-edge-cases.sql
@@ -8,3 +8,13 @@ SELECT col1.a, a FROM t1 ORDER BY col1.a;
 SELECT split(col1, '-')[1] AS a FROM VALUES('a-b') ORDER BY split(col1, 
'-')[1];
 
 DROP TABLE t1;
+
+-- SPARK-57186: extracting a field/element/key from a NullType base returns 
NULL instead of
+-- throwing INVALID_EXTRACT_BASE_FIELD_TYPE (SQL NULL propagation; a NullType 
column can arise e.g.
+-- from schema evolution with missing columns). This applies uniformly to 
dotted field access
+-- (`col.a`) and the subscript forms (`col[0]`, `col['key']`), and is 
implemented at the
+-- user-facing resolution sites (ExtractValue.applyOrNull) without changing 
the shared
+-- ExtractValue.extractValue utility.
+SELECT col.a FROM (SELECT null AS col) t;
+SELECT col[0] FROM (SELECT null AS col) t;
+SELECT col['key'] FROM (SELECT null AS col) t;
diff --git 
a/sql/core/src/test/resources/sql-tests/inputs/having-and-order-by-recursive-type-name-resolution.sql
 
b/sql/core/src/test/resources/sql-tests/inputs/having-and-order-by-recursive-type-name-resolution.sql
index 1f53ca359fe1..de3f5c8cc43f 100644
--- 
a/sql/core/src/test/resources/sql-tests/inputs/having-and-order-by-recursive-type-name-resolution.sql
+++ 
b/sql/core/src/test/resources/sql-tests/inputs/having-and-order-by-recursive-type-name-resolution.sql
@@ -141,3 +141,10 @@ FROM VALUES (NAMED_STRUCT('nums', ARRAY(10, 20))) t (col1)
 GROUP BY col1
 HAVING col1.nums[0] + col1.nums[1] > 25
 ORDER BY col1.nums[0];
+
+-- SPARK-57186: Alias type: Struct, Table column type: NullType (void).
+-- Unlike the STRING/ARRAY/MAP input bases above, which throw 
INVALID_EXTRACT_BASE_FIELD_TYPE for
+-- this shadowing pattern, a NullType input column that shadows the struct 
alias yields NULL
+-- (NULL propagation). The HAVING predicate is therefore NULL and the row is 
filtered out, giving
+-- an empty result. NullType is intentionally the one base type that does not 
error here.
+SELECT NAMED_STRUCT('a', 1) AS col1 FROM VALUES (NULL) t (col1) GROUP BY col1 
HAVING col1.a == 1;
diff --git 
a/sql/core/src/test/resources/sql-tests/results/extract-value-nulltype-single-pass.sql.out
 
b/sql/core/src/test/resources/sql-tests/results/extract-value-nulltype-single-pass.sql.out
new file mode 100644
index 000000000000..e41066684512
--- /dev/null
+++ 
b/sql/core/src/test/resources/sql-tests/results/extract-value-nulltype-single-pass.sql.out
@@ -0,0 +1,7 @@
+-- Automatically generated by SQLQueryTestSuite
+-- !query
+SELECT col.a FROM (SELECT null AS col) t
+-- !query schema
+struct<a:void>
+-- !query output
+NULL
diff --git 
a/sql/core/src/test/resources/sql-tests/results/extract-value-resolution-edge-cases.sql.out
 
b/sql/core/src/test/resources/sql-tests/results/extract-value-resolution-edge-cases.sql.out
index 0565edc99b95..4d6cbb936d77 100644
--- 
a/sql/core/src/test/resources/sql-tests/results/extract-value-resolution-edge-cases.sql.out
+++ 
b/sql/core/src/test/resources/sql-tests/results/extract-value-resolution-edge-cases.sql.out
@@ -37,3 +37,27 @@ DROP TABLE t1
 struct<>
 -- !query output
 
+
+
+-- !query
+SELECT col.a FROM (SELECT null AS col) t
+-- !query schema
+struct<a:void>
+-- !query output
+NULL
+
+
+-- !query
+SELECT col[0] FROM (SELECT null AS col) t
+-- !query schema
+struct<col[0]:void>
+-- !query output
+NULL
+
+
+-- !query
+SELECT col['key'] FROM (SELECT null AS col) t
+-- !query schema
+struct<col[key]:void>
+-- !query output
+NULL
diff --git 
a/sql/core/src/test/resources/sql-tests/results/having-and-order-by-recursive-type-name-resolution.sql.out
 
b/sql/core/src/test/resources/sql-tests/results/having-and-order-by-recursive-type-name-resolution.sql.out
index f685076f9f30..df08b3837b55 100644
--- 
a/sql/core/src/test/resources/sql-tests/results/having-and-order-by-recursive-type-name-resolution.sql.out
+++ 
b/sql/core/src/test/resources/sql-tests/results/having-and-order-by-recursive-type-name-resolution.sql.out
@@ -427,3 +427,11 @@ ORDER BY col1.nums[0]
 struct<sum_val:int>
 -- !query output
 30
+
+
+-- !query
+SELECT NAMED_STRUCT('a', 1) AS col1 FROM VALUES (NULL) t (col1) GROUP BY col1 
HAVING col1.a == 1
+-- !query schema
+struct<col1:struct<a:int>>
+-- !query output
+


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to