[
https://issues.apache.org/jira/browse/IMPALA-7204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Greg Rahn updated IMPALA-7204:
------------------------------
Labels: sql-language (was: GROUP_BY sql)
> Add support for GROUP BY ROLLUP
> -------------------------------
>
> Key: IMPALA-7204
> URL: https://issues.apache.org/jira/browse/IMPALA-7204
> Project: IMPALA
> Issue Type: New Feature
> Components: Backend
> Affects Versions: Impala 3.0, Impala 2.12.0
> Reporter: Ruslan Dautkhanov
> Priority: Major
> Labels: sql-language
>
> Now suppose that we'd like to analyze our sales data, to study the amount of
> sales that is occurring for different products, in different states and
> regions. Using the ROLLUP feature of SQL 2003, we could issue the query:
> {code:sql}
> select region, state, product, sum(sales) total_sales
> from sales_history
> group by rollup (region, state, product)
> {code}
> Semantically, the above query is equivalent to
>
> {code:sql}
> select region, state, product, sum(sales) total_sales
> from sales_history
> group by region, state, product
> union
> select region, state, null, sum(sales) total_sales
> from sales_history
> group by region, state
> union
> select region, null, null, sum(sales) total_sales
> from sales_history
> group by region
> union
> select null, null, null, sum(sales) total_sales
> from sales_history
>
> {code}
> The query might produce results that looked something like:
> {noformat}
> REGION STATE PRODUCT TOTAL_SALES
> ------ ----- ------- -----------
> null null null 6200
> EAST MA BOATS 100
> EAST MA CARS 1500
> EAST MA null 1600
> EAST NY BOATS 150
> EAST NY CARS 1000
> EAST NY null 1150
> EAST null null 2750
> WEST CA BOATS 750
> WEST CA CARS 500
> WEST CA null 1250
> WEST AZ BOATS 2000
> WEST AZ CARS 200
> WEST AZ null 2200
> WEST null null 3450
> {noformat}
> We have a lot of production queries that work around this missing Impala
> functionality by having three UNION ALLs. Physical execution plan shows
> Impala actually reads full fact table three times. So it could be a three
> times improvement (or more, depending on number of columns that are being
> rolled up).
> I can't find another SQL on Hadoop engine that doesn't support this feature.
> *Checked Spark, Hive, PIG, Flink and some other engines - they all do
> support this basic SQL feature*.
> Would be great to have a matching feature in Impala too.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]