[jira] [Commented] (IMPALA-2792) Syntactic sugar for computing aggregates over nested collections.

Gabor Kaszab (Jira) Thu, 02 Apr 2020 01:18:29 -0700


    [ 
https://issues.apache.org/jira/browse/IMPALA-2792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17073494#comment-17073494
 ]


Gabor Kaszab commented on IMPALA-2792:
--------------------------------------

I found this in a doc about Impala complex types:

A possibly more natural and faster way to express aggregations on complex types 
is to introduce “collection aggregate functions”. Each such function take as 
argument a column-reference expression that may mention collection types along 
its path. For example, to get the average number of orders per customer we 
would usually write:
SELECT AVG(cnt) FROM (
 SELECT COUNT(ord.id) cnt FROM customer cust, cust.orders ord
 GROUP BY cust.id)
Instead, we could use “collection aggregate functions” to write more concisely:
SELECT AVG(COUNT(orders)) FROM customer

The query using the collection aggregate function does not need unnesting. The 
COUNT function accepts a collection type (orders) and simply extracts the 
number of items from the collection. Note that the new COUNT function is not an 
aggregate function in the traditional sense because it does not reduce the 
cardinality of the input.

You can imagine similar variants of such collection aggregates, for example, 
the following query returns the average price of orders per customer.
SELECT AVG(price in orders) FROM customer
The new “in” expression produces an array of prices from an array of 
struct-typed orders, and the new AVG collection aggregate produces the average 
price.



> Syntactic sugar for computing aggregates over nested collections.
> -----------------------------------------------------------------
>
>                 Key: IMPALA-2792
>                 URL: https://issues.apache.org/jira/browse/IMPALA-2792
>             Project: IMPALA
>          Issue Type: New Feature
>          Components: Frontend
>    Affects Versions: Impala 2.3.0
>            Reporter: Alexander Behm
>            Assignee: Tamas Mate
>            Priority: Major
>              Labels: complextype, nested_types, planner, ramp-up, usability
>
> For user convenience and SQL brevity, we should add syntax extensions to 
> concisely express aggregates over nested collections. Internally, we should 
> re-write the concise versions into the more verbose equivalent with a 
> correlated inline view.
> Example A:
> {code}
> New syntax:
> select count(c.orders) from customer c
> Internally rewrite to:
> select cnt from customer c, (select count(*) from c.orders) v
> {code}
> Example B:
> {code}
> New syntax:
> select avg(c.orders.items.price) from customer c
> Internally rewrite to:
> select a from customer c, (select avg(price) from c.orders.items) v
> {code}
> I suggest performing the rewrite inside StmtRewriter.java after rewriting all 
> subqueries from the WHERE clause.
> Similar syntactic improvements should be considered for analytic functions on 
> nested collections.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (IMPALA-2792) Syntactic sugar for computing aggregates over nested collections.

Reply via email to