This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
     new ec90eb09df60 [SPARK-55356][SQL] Support alias for PIVOT clause
ec90eb09df60 is described below

commit ec90eb09df60d9d090789b02db457c85cb958fc0
Author: Filip Davidovic <[email protected]>
AuthorDate: Thu Feb 5 16:36:16 2026 +0500

    [SPARK-55356][SQL] Support alias for PIVOT clause
    
    ### What changes were proposed in this pull request?
    
    Add support for alias in PIVOT clause, following the same syntax as 
[T-SQL](https://learn.microsoft.com/en-us/sql/t-sql/queries/from-using-pivot-and-unpivot?view=sql-server-ver17)
 and 
[BigQuery](https://docs.cloud.google.com/bigquery/docs/reference/standard-sql/query-syntax#pivot_operator).
    
    Short spec to describe the improvement:
    > Extend the `PIVOT` clause to accept an optional alias (`PIVOT (...) AS 
alias`), allowing the pivoted result set to be referenced by name in the 
surrounding query context. The alias qualifies all columns produced by the 
pivot operation, including both the grouping columns and the dynamically 
generated value columns. When an alias is provided, columns may be referenced 
as `alias.column_name`. When omitted, current unqualified resolution behaviour 
is preserved. In queries with multiple [...]
    
    ### Why are the changes needed?
    
    Main reason for the improvement is parity with platforms like SQL Server. 
When customers want to migrate, their queries using PIVOT alias can't be 
automatically transpiled, requiring a human in the loop.
    
    ### Does this PR introduce _any_ user-facing change?
    
    Yes, it introduces an optional alias and updates the documentation to 
reflect it. Change is fully backwards compatible.
    
    ### How was this patch tested?
    
    Added end-to-end tests in `SQLQueryTestSuite` and unit tests in 
`PivotParserSuite`.
    ```bash
    build/sbt "catalyst/testOnly *PivotParserSuite"
    
    build/sbt "sql/testOnly *SQLQueryTestSuite -- -z pivot.sql"
    ```
    
    ### Was this patch authored or co-authored using generative AI tooling?
    
    Implemented with help of Claude Code.
    
    Closes #54137 from filipdavidovic/feat/sql-pivot-alias.
    
    Authored-by: Filip Davidovic <[email protected]>
    Signed-off-by: Wenchen Fan <[email protected]>
---
 docs/sql-ref-syntax-qry-select-pivot.md            |  23 ++++-
 .../spark/sql/catalyst/parser/SqlBaseParser.g4     |   2 +-
 .../spark/sql/catalyst/parser/AstBuilder.scala     |   9 +-
 .../sql/catalyst/parser/PivotParserSuite.scala     | 111 +++++++++++++++++++++
 .../sql-tests/analyzer-results/pivot.sql.out       |  56 +++++++++++
 .../src/test/resources/sql-tests/inputs/pivot.sql  |  19 ++++
 .../test/resources/sql-tests/results/pivot.sql.out |  31 ++++++
 7 files changed, 247 insertions(+), 4 deletions(-)

diff --git a/docs/sql-ref-syntax-qry-select-pivot.md 
b/docs/sql-ref-syntax-qry-select-pivot.md
index 71f7cc7cff12..fc78fe0d7d2a 100644
--- a/docs/sql-ref-syntax-qry-select-pivot.md
+++ b/docs/sql-ref-syntax-qry-select-pivot.md
@@ -27,7 +27,7 @@ The `PIVOT` clause is used for data perspective. We can get 
the aggregated value
 
 ```sql
 PIVOT ( { aggregate_expression [ AS aggregate_expression_alias ] } [ , ... ]
-    FOR column_list IN ( expression_list ) )
+    FOR column_list IN ( expression_list ) ) [[AS] alias]
 ```
 
 ### Parameters
@@ -47,7 +47,11 @@ PIVOT ( { aggregate_expression [ AS 
aggregate_expression_alias ] } [ , ... ]
 * **expression_list**
 
     Specifies new columns, which are used to match values in `column_list` as 
the aggregating condition. We can also add aliases for them.
-    
+
+* **alias**
+
+    Specifies an alias for the pivot result, which can be used to reference 
the pivot columns in the query.
+
 ### Examples
 
 ```sql
@@ -85,6 +89,21 @@ SELECT * FROM person
 | 300  | Street 3  | NULL  | NULL  | NULL  | NULL  |
 | 400  | Street 4  | NULL  | NULL  | NULL  | NULL  |
 +------+-----------+-------+-------+-------+-------+
+
+-- pivot result can be referenced via its alias
+SELECT pv.* FROM person
+    PIVOT (
+        SUM(age) AS a
+        FOR name IN ('John' AS john, 'Mike' AS mike)
+    ) AS pv;
++------+-----------+---------+---------+
+|  id  |  address  | john_a  | mike_a  |
++------+-----------+---------+---------+
+| 200  | Street 2  | NULL    | NULL    |
+| 100  | Street 1  | 30      | NULL    |
+| 300  | Street 3  | NULL    | 80      |
+| 400  | Street 4  | NULL    | NULL    |
++------+-----------+---------+---------+
 ```
 
 ### Related Statements
diff --git 
a/sql/api/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseParser.g4 
b/sql/api/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseParser.g4
index 60bb38121dec..d61dd137ec5a 100644
--- 
a/sql/api/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseParser.g4
+++ 
b/sql/api/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseParser.g4
@@ -930,7 +930,7 @@ groupingSet
     ;
 
 pivotClause
-    : PIVOT LEFT_PAREN aggregates=namedExpressionSeq FOR pivotColumn IN 
LEFT_PAREN pivotValues+=pivotValue (COMMA pivotValues+=pivotValue)* RIGHT_PAREN 
RIGHT_PAREN
+    : PIVOT LEFT_PAREN aggregates=namedExpressionSeq FOR pivotColumn IN 
LEFT_PAREN pivotValues+=pivotValue (COMMA pivotValues+=pivotValue)* RIGHT_PAREN 
RIGHT_PAREN (AS? errorCapturingIdentifier)?
     ;
 
 pivotColumn
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
index 0e9844d7f1a2..99d76430c3cd 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
@@ -1986,7 +1986,14 @@ class AstBuilder extends DataTypeAstBuilder
           identifier => 
UnresolvedAttribute.quoted(getIdentifierText(identifier))).toSeq)
     }
     val pivotValues = ctx.pivotValues.asScala.map(visitPivotValue)
-    Pivot(None, pivotColumn, pivotValues.toSeq, aggregates, query)
+    val pivot = Pivot(None, pivotColumn, pivotValues.toSeq, aggregates, query)
+
+    if (ctx.errorCapturingIdentifier() != null) {
+      val alias = getIdentifierText(ctx.errorCapturingIdentifier())
+      SubqueryAlias(alias, pivot)
+    } else {
+      pivot
+    }
   }
 
   /**
diff --git 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/PivotParserSuite.scala
 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/PivotParserSuite.scala
new file mode 100644
index 000000000000..bb31c2adcb6a
--- /dev/null
+++ 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/PivotParserSuite.scala
@@ -0,0 +1,111 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.parser
+
+import org.apache.spark.sql.catalyst.analysis.{AnalysisTest, 
UnresolvedAttribute, UnresolvedFunction}
+import org.apache.spark.sql.catalyst.expressions.Literal
+import org.apache.spark.sql.catalyst.plans.logical.{LogicalPlan, Pivot}
+
+class PivotParserSuite extends AnalysisTest {
+
+  import CatalystSqlParser._
+  import org.apache.spark.sql.catalyst.dsl.expressions._
+  import org.apache.spark.sql.catalyst.dsl.plans._
+
+  private def assertEqual(sqlCommand: String, plan: LogicalPlan): Unit = {
+    comparePlans(parsePlan(sqlCommand), plan, checkAnalysis = false)
+  }
+
+  test("pivot - alias") {
+    Seq(
+      "SELECT pv.* FROM t PIVOT (sum(a) FOR b IN (1, 2)) pv",
+      "SELECT pv.* FROM t PIVOT (sum(a) FOR b IN (1, 2)) AS pv"
+    ).foreach { sql =>
+      withClue(sql) {
+        assertEqual(
+          sql,
+          Pivot(
+            None,
+            UnresolvedAttribute("b"),
+            Seq(Literal(1), Literal(2)),
+            Seq(UnresolvedFunction("sum", Seq(UnresolvedAttribute("a")), 
isDistinct = false)),
+            table("t"))
+            .subquery("pv")
+            .select(star("pv"))
+        )
+      }
+    }
+  }
+
+  test("pivot - no alias") {
+    assertEqual(
+      "SELECT * FROM t PIVOT (sum(a) FOR b IN (1, 2))",
+      Pivot(
+        None,
+        UnresolvedAttribute("b"),
+        Seq(Literal(1), Literal(2)),
+        Seq(UnresolvedFunction("sum", Seq(UnresolvedAttribute("a")), 
isDistinct = false)),
+        table("t"))
+        .select(star())
+    )
+  }
+
+  test("pivot - alias with qualified column references") {
+    Seq(
+      "SELECT pv.x, pv.y FROM t PIVOT (sum(a) FOR b IN (1, 2)) pv",
+      "SELECT pv.x, pv.y FROM t PIVOT (sum(a) FOR b IN (1, 2)) AS pv"
+    ).foreach { sql =>
+      withClue(sql) {
+        assertEqual(
+          sql,
+          Pivot(
+            None,
+            UnresolvedAttribute("b"),
+            Seq(Literal(1), Literal(2)),
+            Seq(UnresolvedFunction("sum", Seq(UnresolvedAttribute("a")), 
isDistinct = false)),
+            table("t"))
+            .subquery("pv")
+            .select($"pv.x", $"pv.y")
+        )
+      }
+    }
+  }
+
+  test("pivot - alias with multiple aggregations") {
+    Seq(
+      "SELECT pv.* FROM t PIVOT (sum(a) s, avg(a) v FOR b IN (1, 2)) pv",
+      "SELECT pv.* FROM t PIVOT (sum(a) s, avg(a) v FOR b IN (1, 2)) AS pv"
+    ).foreach { sql =>
+      withClue(sql) {
+        assertEqual(
+          sql,
+          Pivot(
+            None,
+            UnresolvedAttribute("b"),
+            Seq(Literal(1), Literal(2)),
+            Seq(
+              UnresolvedFunction("sum", Seq(UnresolvedAttribute("a")), 
isDistinct = false).as("s"),
+              UnresolvedFunction("avg", Seq(UnresolvedAttribute("a")), 
isDistinct = false).as("v")),
+            table("t"))
+            .subquery("pv")
+            .select(star("pv"))
+        )
+      }
+    }
+  }
+}
diff --git 
a/sql/core/src/test/resources/sql-tests/analyzer-results/pivot.sql.out 
b/sql/core/src/test/resources/sql-tests/analyzer-results/pivot.sql.out
index 93f2e240a019..a92a06d0f453 100644
--- a/sql/core/src/test/resources/sql-tests/analyzer-results/pivot.sql.out
+++ b/sql/core/src/test/resources/sql-tests/analyzer-results/pivot.sql.out
@@ -791,3 +791,59 @@ Project [a#x, z#x, b#x, y#x, c#x, x#x, d#x, w#x, 
dotNET#xL, Java#xL]
                         +- Project [course#x, year#x, earnings#x]
                            +- SubqueryAlias courseSales
                               +- LocalRelation [course#x, year#x, earnings#x]
+
+-- !query
+SELECT pv.year, pv.net, pv.jv FROM (
+  SELECT year, course, earnings FROM courseSales
+)
+PIVOT (
+  sum(earnings)
+  FOR course IN ('dotNET' as net, 'Java' as jv)
+) AS pv
+-- !query analysis
+Project [year#x, net#xL, jv#xL]
++- SubqueryAlias pv
+   +- Project [year#x, __pivot_sum(__auto_generated_subquery_name.earnings) AS 
`sum(__auto_generated_subquery_name.earnings)`#x[0] AS net#xL, 
__pivot_sum(__auto_generated_subquery_name.earnings) AS 
`sum(__auto_generated_subquery_name.earnings)`#x[1] AS jv#xL]
+      +- Aggregate [year#x], [year#x, pivotfirst(course#x, 
sum(__auto_generated_subquery_name.earnings)#xL, dotNET, Java, 0, 0) AS 
__pivot_sum(__auto_generated_subquery_name.earnings) AS 
`sum(__auto_generated_subquery_name.earnings)`#x]
+         +- Aggregate [year#x, course#x], [year#x, course#x, sum(earnings#x) 
AS sum(__auto_generated_subquery_name.earnings)#xL]
+            +- SubqueryAlias __auto_generated_subquery_name
+               +- Project [year#x, course#x, earnings#x]
+                  +- SubqueryAlias coursesales
+                     +- View (`courseSales`, [course#x, year#x, earnings#x])
+                        +- Project [cast(course#x as string) AS course#x, 
cast(year#x as int) AS year#x, cast(earnings#x as int) AS earnings#x]
+                           +- Project [course#x, year#x, earnings#x]
+                              +- SubqueryAlias courseSales
+                                 +- LocalRelation [course#x, year#x, 
earnings#x]
+
+
+-- !query
+SELECT pv.year, pv.dotNET, y.s FROM (
+  SELECT year, course, earnings FROM courseSales
+)
+PIVOT (
+  sum(earnings)
+  FOR course IN ('dotNET', 'Java')
+) pv
+JOIN years y ON pv.year = y.y
+-- !query analysis
+Project [year#x, dotNET#xL, s#x]
++- Join Inner, (year#x = y#x)
+   :- SubqueryAlias pv
+   :  +- Project [year#x, __pivot_sum(__auto_generated_subquery_name.earnings) 
AS `sum(__auto_generated_subquery_name.earnings)`#x[0] AS dotNET#xL, 
__pivot_sum(__auto_generated_subquery_name.earnings) AS 
`sum(__auto_generated_subquery_name.earnings)`#x[1] AS Java#xL]
+   :     +- Aggregate [year#x], [year#x, pivotfirst(course#x, 
sum(__auto_generated_subquery_name.earnings)#xL, dotNET, Java, 0, 0) AS 
__pivot_sum(__auto_generated_subquery_name.earnings) AS 
`sum(__auto_generated_subquery_name.earnings)`#x]
+   :        +- Aggregate [year#x, course#x], [year#x, course#x, 
sum(earnings#x) AS sum(__auto_generated_subquery_name.earnings)#xL]
+   :           +- SubqueryAlias __auto_generated_subquery_name
+   :              +- Project [year#x, course#x, earnings#x]
+   :                 +- SubqueryAlias coursesales
+   :                    +- View (`courseSales`, [course#x, year#x, earnings#x])
+   :                       +- Project [cast(course#x as string) AS course#x, 
cast(year#x as int) AS year#x, cast(earnings#x as int) AS earnings#x]
+   :                          +- Project [course#x, year#x, earnings#x]
+   :                             +- SubqueryAlias courseSales
+   :                                +- LocalRelation [course#x, year#x, 
earnings#x]
+   +- SubqueryAlias y
+      +- SubqueryAlias years
+         +- View (`years`, [y#x, s#x])
+            +- Project [cast(y#x as int) AS y#x, cast(s#x as int) AS s#x]
+               +- Project [y#x, s#x]
+                  +- SubqueryAlias years
+                     +- LocalRelation [y#x, s#x]
diff --git a/sql/core/src/test/resources/sql-tests/inputs/pivot.sql 
b/sql/core/src/test/resources/sql-tests/inputs/pivot.sql
index c2ecd97e2b02..fcc145959d87 100644
--- a/sql/core/src/test/resources/sql-tests/inputs/pivot.sql
+++ b/sql/core/src/test/resources/sql-tests/inputs/pivot.sql
@@ -298,3 +298,22 @@ PIVOT (
   sum(Earnings)
   FOR Course IN ('dotNET', 'Java')
 );
+
+-- pivot with alias and target column aliases
+SELECT pv.year, pv.net, pv.jv FROM (
+  SELECT year, course, earnings FROM courseSales
+)
+PIVOT (
+  sum(earnings)
+  FOR course IN ('dotNET' as net, 'Java' as jv)
+) AS pv;
+
+-- pivot with alias - join with another table
+SELECT pv.year, pv.dotNET, y.s FROM (
+  SELECT year, course, earnings FROM courseSales
+)
+PIVOT (
+  sum(earnings)
+  FOR course IN ('dotNET', 'Java')
+) pv
+JOIN years y ON pv.year = y.y;
diff --git a/sql/core/src/test/resources/sql-tests/results/pivot.sql.out 
b/sql/core/src/test/resources/sql-tests/results/pivot.sql.out
index aaafb19cb979..452901e266d3 100644
--- a/sql/core/src/test/resources/sql-tests/results/pivot.sql.out
+++ b/sql/core/src/test/resources/sql-tests/results/pivot.sql.out
@@ -562,3 +562,34 @@ PIVOT (
 
struct<a:string,z:string,b:string,y:string,c:string,x:string,d:string,w:string,dotNET:bigint,Java:bigint>
 -- !query output
 a      z       b       y       c       x       d       w       63000   50000
+
+
+-- !query
+SELECT pv.year, pv.net, pv.jv FROM (
+  SELECT year, course, earnings FROM courseSales
+)
+PIVOT (
+  sum(earnings)
+  FOR course IN ('dotNET' as net, 'Java' as jv)
+) AS pv
+-- !query schema
+struct<year:int,net:bigint,jv:bigint>
+-- !query output
+2012   15000   20000
+2013   48000   30000
+
+
+-- !query
+SELECT pv.year, pv.dotNET, y.s FROM (
+  SELECT year, course, earnings FROM courseSales
+)
+PIVOT (
+  sum(earnings)
+  FOR course IN ('dotNET', 'Java')
+) pv
+JOIN years y ON pv.year = y.y
+-- !query schema
+struct<year:int,dotNET:bigint,s:int>
+-- !query output
+2012   15000   1
+2013   48000   2


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to