[
https://issues.apache.org/jira/browse/IMPALA-2792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17073494#comment-17073494
]
Gabor Kaszab commented on IMPALA-2792:
--------------------------------------
I found this in a doc about Impala complex types:
A possibly more natural and faster way to express aggregations on complex types
is to introduce “collection aggregate functions”. Each such function take as
argument a column-reference expression that may mention collection types along
its path. For example, to get the average number of orders per customer we
would usually write:
SELECT AVG(cnt) FROM (
SELECT COUNT(ord.id) cnt FROM customer cust, cust.orders ord
GROUP BY cust.id)
Instead, we could use “collection aggregate functions” to write more concisely:
SELECT AVG(COUNT(orders)) FROM customer
The query using the collection aggregate function does not need unnesting. The
COUNT function accepts a collection type (orders) and simply extracts the
number of items from the collection. Note that the new COUNT function is not an
aggregate function in the traditional sense because it does not reduce the
cardinality of the input.
You can imagine similar variants of such collection aggregates, for example,
the following query returns the average price of orders per customer.
SELECT AVG(price in orders) FROM customer
The new “in” expression produces an array of prices from an array of
struct-typed orders, and the new AVG collection aggregate produces the average
price.
> Syntactic sugar for computing aggregates over nested collections.
> -----------------------------------------------------------------
>
> Key: IMPALA-2792
> URL: https://issues.apache.org/jira/browse/IMPALA-2792
> Project: IMPALA
> Issue Type: New Feature
> Components: Frontend
> Affects Versions: Impala 2.3.0
> Reporter: Alexander Behm
> Assignee: Tamas Mate
> Priority: Major
> Labels: complextype, nested_types, planner, ramp-up, usability
>
> For user convenience and SQL brevity, we should add syntax extensions to
> concisely express aggregates over nested collections. Internally, we should
> re-write the concise versions into the more verbose equivalent with a
> correlated inline view.
> Example A:
> {code}
> New syntax:
> select count(c.orders) from customer c
> Internally rewrite to:
> select cnt from customer c, (select count(*) from c.orders) v
> {code}
> Example B:
> {code}
> New syntax:
> select avg(c.orders.items.price) from customer c
> Internally rewrite to:
> select a from customer c, (select avg(price) from c.orders.items) v
> {code}
> I suggest performing the rewrite inside StmtRewriter.java after rewriting all
> subqueries from the WHERE clause.
> Similar syntactic improvements should be considered for analytic functions on
> nested collections.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]