Tim Armstrong created IMPALA-10025:
--------------------------------------
Summary: Avoid rebuilding in-memory heap during output phase of
top-n
Key: IMPALA-10025
URL: https://issues.apache.org/jira/browse/IMPALA-10025
Project: IMPALA
Issue Type: Sub-task
Components: Backend
Reporter: Tim Armstrong
Assignee: Tim Armstrong
In the patch for IMPALA-9853, we reuse some code in the output phase that
necessitated building the in-memory heap from the sorter's output. This has
some inherent overhead that gets worse for larger limits and/or partition
counts.
It would be better to have the sorter do a full sort on partition/order by
columns and then apply the limit while streaming the results back from the
sorter. In combination with IMPALA-10023 this would let us gracefully degrade
to doing something closer to a regular sort and probably let us bump
ANALYTIC_PUSHDOWN_THRESHOLD.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)