[jira] [Created] (IMPALA-9850) Avoid doing expensive join at the coordinator

Aman Sinha (Jira) Thu, 11 Jun 2020 15:27:25 -0700

Aman Sinha created IMPALA-9850:
----------------------------------

             Summary: Avoid doing expensive join at the coordinator
                 Key: IMPALA-9850
                 URL: https://issues.apache.org/jira/browse/IMPALA-9850
             Project: IMPALA
          Issue Type: Improvement
          Components: Frontend
    Affects Versions: Impala 3.4.0
            Reporter: Aman Sinha
            Assignee: Aman Sinha



When we join 2 subqueries each of which has its root fragment executing on the 
coordinator, currently the join is also executed on the coordinator. Here's an 
example query:
{noformat}
select count(*) from 
  ( select rank() over (order by l_quantity) x from lineitem) dt1 
     inner join 
  (select rank() over (order by l_shipdate) y from lineitem) dt2
     on dt1.x = dt2.y
{noformat}
This can result in poor performance since the join inputs can be quite large.  
Ideally, we want to re-distribute both intermediate results after the rank() 
has been computed and do the join on executor nodes.  

Another similar scenario is joining of subqueries which have ORDER BY LIMIT. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Created] (IMPALA-9850) Avoid doing expensive join at the coordinator

Reply via email to