[
https://issues.apache.org/jira/browse/IMPALA-10099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Shant Hovsepian updated IMPALA-10099:
-------------------------------------
Description:
The implementation of SetOperations for EXCEPT/INTERSECT in IMPALA-9943
produced query rewrites that would apply DISTINCT aggregation after exchanges
for distributed plans. In case where the query can be directly rewritten to
apply the DISTINCT to the set operation operands would result in better
performance for most large queries.
This should help the performance TPC-DS Q14 which does an INTERSECT of queries
with large result sets that contain many duplicates.
In general it would better to have DISTINCT move around optimization phase
during planning which would handle this case as well as many others.
was:
The implementation of SetOperations for EXCEPT/INTER in IMPALA-9943 produced
query rewrites that would apply DISTINCT aggregation after exchanges for
distributed plans. In case where the query can be directly rewritten to apply
the DISTINCT to the set operation operands would result in better performance
for most large queries.
This should help the performance TPC-DS Q14 which does an INTERSECT of queries
with large result sets that contain many duplicates.
In general it would better to have DISTINCT move around optimization phase
during planning which would handle this case as well as many others.
> Push down DISTINCT aggregation for EXCEPT/INTERSECT
> ---------------------------------------------------
>
> Key: IMPALA-10099
> URL: https://issues.apache.org/jira/browse/IMPALA-10099
> Project: IMPALA
> Issue Type: Improvement
> Reporter: Shant Hovsepian
> Assignee: Shant Hovsepian
> Priority: Major
>
> The implementation of SetOperations for EXCEPT/INTERSECT in IMPALA-9943
> produced query rewrites that would apply DISTINCT aggregation after exchanges
> for distributed plans. In case where the query can be directly rewritten to
> apply the DISTINCT to the set operation operands would result in better
> performance for most large queries.
> This should help the performance TPC-DS Q14 which does an INTERSECT of
> queries with large result sets that contain many duplicates.
> In general it would better to have DISTINCT move around optimization phase
> during planning which would handle this case as well as many others.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]