[ 
https://issues.apache.org/jira/browse/HIVE-13019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dustin Cote updated HIVE-13019:
-------------------------------
    Component/s: Logical Optimizer

> Optimizer COLLECT_LIST/COLLECT_SET 
> -----------------------------------
>
>                 Key: HIVE-13019
>                 URL: https://issues.apache.org/jira/browse/HIVE-13019
>             Project: Hive
>          Issue Type: Improvement
>          Components: Logical Optimizer
>            Reporter: Dustin Cote
>            Priority: Minor
>
> Currently when using a COLLECT_SET/COLLECT_LIST that involves data from a 
> single table, the aggregation is done after any JOIN operation that is 
> present in the query.  For example:
> {code}
> insert into table nested_customers_orders
> select c.*, collect_list(named_struct("oid", o.oid, "order_date": o.date...))
> from customers c inner join orders o on (c.cid = o.oid)
> group by o.oid, o.date,...
> {code}
> If we can tell the optimizer to perform the COLLECT_LIST first (where 
> possible) we can see some performance gains in this pattern of query.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to