[
https://issues.apache.org/jira/browse/HIVE-28760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17926772#comment-17926772
]
Stamatis Zampetakis commented on HIVE-28760:
--------------------------------------------
Given that WITHIN GROUP support already exists for other aggregate functions
the changes that are needed to support the clause in COLLECT_LIST function
would be mostly limited around {{GenericUDAFCollectList}} and
{{GenericUDAFMkCollectionEvaluator}}.
> Support WITHIN GROUP in COLLECT_LIST aggregate function
> -------------------------------------------------------
>
> Key: HIVE-28760
> URL: https://issues.apache.org/jira/browse/HIVE-28760
> Project: Hive
> Issue Type: New Feature
> Components: SQL
> Reporter: Stamatis Zampetakis
> Assignee: Stamatis Zampetakis
> Priority: Major
>
> The WITHIN GROUP clause lets aggregate functions operate on a sorted list of
> rows and its already supported by Hive in some aggregate functions (e.g.
> HIVE-16255).
> The goal of this ticket is to support the use of WITHIN GROUP along
> COLLECT_LIST aggregate function to be able to impose an order on the array
> returned by COLLECT_LIST.
> This feature would be extremely useful for scenarios like the one described
> below.
> {code:sql}
> CREATE TABLE customer (id INT, name STRING, age INT, country STRING);
> INSERT INTO customer VALUES
> (0, 'Bob', 22, 'GREECE'),
> (1, 'Alice', 34, 'FRANCE'),
> (2, 'Victor', 80, 'GERMANY'),
> (3, 'Alicia', 55, 'GREECE'),
> (4, 'Frank', 18, 'FRANCE'),
> (5, 'Robin', 15, 'FRANCE'),
> (6, 'Paul', 75, 'GERMANY');
> {code}
> _Group all customers by country and for each country collect all the customer
> names sorted by the age of each customer._
> {code:sql}
> SELECT country, COLLECT_LIST(name) WITHIN GROUP (ORDER BY age) as all_names
> FROM customer
> GROUP BY country;
> {code}
> {noformat}
> +----------+----------------------------+
> | country | all_names |
> +----------+----------------------------+
> | GERMANY | ["Paul","Victor"] |
> | FRANCE | ["Robin","Frank","Alice"] |
> | GREECE | ["Bob","Alicia"] |
> +----------+----------------------------+
> {noformat}
> There are a few other ways to achieve the same result currently in Hive (e.g.
> window functions, sort_arrray_by) but they are more cumbersome to write and
> depending how they are written less efficient.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)