[jira] [Commented] (HIVE-28760) Support WITHIN GROUP in COLLECT_LIST aggregate function

Stamatis Zampetakis (Jira) Thu, 13 Feb 2025 05:25:38 -0800


    [ 
https://issues.apache.org/jira/browse/HIVE-28760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17926772#comment-17926772
 ]


Stamatis Zampetakis commented on HIVE-28760:
--------------------------------------------

Given that WITHIN GROUP support already exists for other aggregate functions 
the changes that are needed to support the clause in COLLECT_LIST function 
would be mostly limited around {{GenericUDAFCollectList}} and 
{{GenericUDAFMkCollectionEvaluator}}.

> Support WITHIN GROUP in COLLECT_LIST aggregate function
> -------------------------------------------------------
>
>                 Key: HIVE-28760
>                 URL: https://issues.apache.org/jira/browse/HIVE-28760
>             Project: Hive
>          Issue Type: New Feature
>          Components: SQL
>            Reporter: Stamatis Zampetakis
>            Assignee: Stamatis Zampetakis
>            Priority: Major
>
> The WITHIN GROUP clause lets aggregate functions operate on a sorted list of 
> rows and its already supported by Hive in some aggregate functions (e.g. 
> HIVE-16255). 
> The goal of this ticket is to support the use of WITHIN GROUP along 
> COLLECT_LIST aggregate function to be able to impose an order on the array 
> returned by COLLECT_LIST.
> This feature would be extremely useful for scenarios like the one described 
> below.
> {code:sql}
> CREATE TABLE customer (id INT, name STRING, age INT, country STRING);
> INSERT INTO customer VALUES 
> (0, 'Bob', 22, 'GREECE'),
> (1, 'Alice', 34, 'FRANCE'),
> (2, 'Victor', 80, 'GERMANY'),
> (3, 'Alicia', 55, 'GREECE'),
> (4, 'Frank', 18, 'FRANCE'),
> (5, 'Robin', 15, 'FRANCE'),
> (6, 'Paul', 75, 'GERMANY');
> {code}
> _Group all customers by country and for each country collect all the customer 
> names sorted by the age of each customer._
> {code:sql}
> SELECT country, COLLECT_LIST(name) WITHIN GROUP (ORDER BY age) as all_names
> FROM customer
> GROUP BY country;
> {code}
> {noformat}
> +----------+----------------------------+
> | country  |         all_names          |
> +----------+----------------------------+
> | GERMANY  | ["Paul","Victor"]          |
> | FRANCE   | ["Robin","Frank","Alice"]  |
> | GREECE   | ["Bob","Alicia"]           |
> +----------+----------------------------+
> {noformat}
> There are a few other ways to achieve the same result currently in Hive (e.g. 
> window functions, sort_arrray_by) but they are more cumbersome to write and 
> depending how they are written less efficient. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (HIVE-28760) Support WITHIN GROUP in COLLECT_LIST aggregate function

Reply via email to