Quanlong Huang created IMPALA-15020:
---------------------------------------
Summary: PIVOT should support expressions on aggregates
Key: IMPALA-15020
URL: https://issues.apache.org/jira/browse/IMPALA-15020
Project: IMPALA
Issue Type: Bug
Components: Frontend
Reporter: Quanlong Huang
Assignee: Xuebin Su
The following query is not supported:
{code:sql}
select y2009, y2010 from functional.alltypes pivot(
count(*) - count(id) for year in (2009 as y2009, 2010 as y2010)) t;
AnalysisException: The function called in the PIVOT clause should be an
aggregate{code}
It can be rewritten to the following query which can run:
{code:sql}
select aggif(year=2009, c) as y2009, aggif(year=2010, c) as y2010 from (
select year, count(*) - count(id) as c
from functional.alltypes group by year) t;{code}
Verified that SparkSQL supports this in
[https://www.sparkplayground.com/pyspark-online-compiler]
{code:python}
from pyspark.sql import SparkSession
spark = SparkSession.builder.appName('Spark Playground').getOrCreate()
df = spark.sql("""with v (id, year) as (
values (1, 2009),
(NULL, 2009),
(2, 2010),
(3, 2010)
)
select y2009, y2010 from v pivot(
count(*) - count(id) for year in (2009 as y2009, 2010 as y2010)
)""")
display(df){code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)