Tim Armstrong created IMPALA-10025:
--------------------------------------

             Summary: Avoid rebuilding in-memory heap during output phase of 
top-n
                 Key: IMPALA-10025
                 URL: https://issues.apache.org/jira/browse/IMPALA-10025
             Project: IMPALA
          Issue Type: Sub-task
          Components: Backend
            Reporter: Tim Armstrong
            Assignee: Tim Armstrong


In the patch for IMPALA-9853, we reuse some code in the output phase that 
necessitated building the in-memory heap from the sorter's output. This has 
some inherent overhead that gets worse for larger limits and/or partition 
counts.

It would be better to have the sorter do a full sort on partition/order by 
columns and then apply the limit while streaming the results back from the 
sorter. In combination with IMPALA-10023 this would let us gracefully degrade 
to doing something closer to a regular sort and probably let us bump 
ANALYTIC_PUSHDOWN_THRESHOLD.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to