[jira] [Updated] (IMPALA-7204) Add support for GROUP BY ROLLUP

Greg Rahn (Jira) Fri, 01 Nov 2019 10:42:07 -0700


     [ 
https://issues.apache.org/jira/browse/IMPALA-7204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Greg Rahn updated IMPALA-7204:
------------------------------
    Labels: sql-language  (was: GROUP_BY sql)

> Add support for GROUP BY ROLLUP
> -------------------------------
>
>                 Key: IMPALA-7204
>                 URL: https://issues.apache.org/jira/browse/IMPALA-7204
>             Project: IMPALA
>          Issue Type: New Feature
>          Components: Backend
>    Affects Versions: Impala 3.0, Impala 2.12.0
>            Reporter: Ruslan Dautkhanov
>            Priority: Major
>              Labels: sql-language
>
> Now suppose that we'd like to analyze our sales data, to study the amount of 
> sales that is occurring for different products, in different states and 
> regions. Using the ROLLUP feature of SQL 2003, we could issue the query:
> {code:sql}
> select region, state, product, sum(sales) total_sales
> from sales_history 
> group by rollup (region, state, product)
> {code}
> Semantically, the above query is equivalent to
>  
> {code:sql}
> select region, state, product, sum(sales) total_sales
> from sales_history 
> group by region, state, product
> union
> select region, state, null, sum(sales) total_sales
> from sales_history 
> group by region, state
> union
> select region, null, null, sum(sales) total_sales
> from sales_history 
> group by region
> union
> select null, null, null, sum(sales) total_sales
> from sales_history
>  
> {code}
> The query might produce results that looked something like:
> {noformat}
> REGION STATE PRODUCT TOTAL_SALES
> ------ ----- ------- -----------
> null null null 6200
> EAST MA BOATS 100
> EAST MA CARS 1500
> EAST MA null 1600
> EAST NY BOATS 150
> EAST NY CARS 1000
> EAST NY null 1150
> EAST null null 2750
> WEST CA BOATS 750
> WEST CA CARS 500
> WEST CA null 1250
> WEST AZ BOATS 2000
> WEST AZ CARS 200
> WEST AZ null 2200
> WEST null null 3450
> {noformat}
> We have a lot of production queries that work around this missing Impala 
> functionality by having three UNION ALLs. Physical execution plan shows 
> Impala actually reads full fact table three times. So it could be a three 
> times improvement (or more, depending on number of columns that are being 
> rolled up).
> I can't find another SQL on Hadoop engine that doesn't support this feature. 
>  *Checked Spark, Hive, PIG, Flink and some other engines - they all do 
> support this basic SQL feature*.
> Would be great to have a matching feature in Impala too.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (IMPALA-7204) Add support for GROUP BY ROLLUP

Reply via email to