Quanlong Huang created IMPALA-15020:
---------------------------------------

             Summary: PIVOT should support expressions on aggregates
                 Key: IMPALA-15020
                 URL: https://issues.apache.org/jira/browse/IMPALA-15020
             Project: IMPALA
          Issue Type: Bug
          Components: Frontend
            Reporter: Quanlong Huang
            Assignee: Xuebin Su


The following query is not supported:
{code:sql}
select y2009, y2010 from functional.alltypes pivot(
  count(*) - count(id) for year in (2009 as y2009, 2010 as y2010)) t;

AnalysisException: The function called in the PIVOT clause should be an 
aggregate{code}
It can be rewritten to the following query which can run:
{code:sql}
select aggif(year=2009, c) as y2009, aggif(year=2010, c) as y2010 from (
  select year, count(*) - count(id) as c
  from functional.alltypes group by year) t;{code}
Verified that SparkSQL supports this in 
[https://www.sparkplayground.com/pyspark-online-compiler]
{code:python}
from pyspark.sql import SparkSession
spark = SparkSession.builder.appName('Spark Playground').getOrCreate()

df = spark.sql("""with v (id, year) as (
  values (1, 2009),
    (NULL, 2009),
    (2, 2010),
    (3, 2010)
)
select y2009, y2010 from v pivot(
  count(*) - count(id) for year in (2009 as y2009, 2010 as y2010)
)""")

display(df){code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to