Aman Sinha created IMPALA-9850:
----------------------------------
Summary: Avoid doing expensive join at the coordinator
Key: IMPALA-9850
URL: https://issues.apache.org/jira/browse/IMPALA-9850
Project: IMPALA
Issue Type: Improvement
Components: Frontend
Affects Versions: Impala 3.4.0
Reporter: Aman Sinha
Assignee: Aman Sinha
When we join 2 subqueries each of which has its root fragment executing on the
coordinator, currently the join is also executed on the coordinator. Here's an
example query:
{noformat}
select count(*) from
( select rank() over (order by l_quantity) x from lineitem) dt1
inner join
(select rank() over (order by l_shipdate) y from lineitem) dt2
on dt1.x = dt2.y
{noformat}
This can result in poor performance since the join inputs can be quite large.
Ideally, we want to re-distribute both intermediate results after the rank()
has been computed and do the join on executor nodes.
Another similar scenario is joining of subqueries which have ORDER BY LIMIT.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]