[
https://issues.apache.org/jira/browse/IMPALA-9983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17169450#comment-17169450
]
ASF subversion and git services commented on IMPALA-9983:
---------------------------------------------------------
Commit ecfc1af0db18de7ad47406b9bff3dcdfdb9d2a05 in impala's branch
refs/heads/master from Aman Sinha
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=ecfc1af ]
IMPALA-9983 : Pushdown limit to analytic sort operator
This patch pushes the LIMIT from a top level Sort down to
the Sort below an Analytic operator when it is safe to do
so. There are several qualifying checks that are done. The
optimization is done at the time of creating the top level
Sort in the single node planner. When the pushdown is
applicable, the analytic sort is converted to a TopN sort.
Further, this is split into a bottom TopN and an upper
TopN separated by a hash partition exchange. This
ensures that the limit is applied as early as possible
before hash partitioning.
Fixed couple of additional related issues uncovered as a
result of limit pushdown:
- Changed the analytic sort's partition-by expr sort
semantic from NULLS FIRST to NULLS LAST to ensure
correctness in the presence of limit.
- The LIMIT on the analytic sort node was causing it to
be treated as a merging point in the distributed planner.
Fixed it by introducing an api allowPartitioned() in the
PlanNode.
Testing:
- Ran PlannerTest and updated several EXPLAIN plans.
- Added Planner tests for both positive and negative cases of
limit pushdown.
- Ran end-to-end TPC-DS queries. Specifically tested
TPC-DS q67 for limit pushdown and result correctness.
- Added targeted end-to-end tests using TPC-H dataset.
Change-Id: Ib39f46a7bb75a34466eef7f91ddc25b6e6c99284
Reviewed-on: http://gerrit.cloudera.org:8080/16219
Reviewed-by: Tim Armstrong <[email protected]>
Tested-by: Impala Public Jenkins <[email protected]>
> Push limit from a top level sort onto analytic sort when applicable
> -------------------------------------------------------------------
>
> Key: IMPALA-9983
> URL: https://issues.apache.org/jira/browse/IMPALA-9983
> Project: IMPALA
> Issue Type: Improvement
> Components: Frontend
> Affects Versions: Impala 3.4.0
> Reporter: Aman Sinha
> Assignee: Aman Sinha
> Priority: Major
>
> For queries of the following type:
> {noformat}
> select * from (select l_partkey, l_quantity,
> rank() over (partition by l_partkey order by l_quantity desc)
> rk
> from lineitem) dt
> where rk <= 100
> order by l_partkey, l_quantity, rk
> limit 100
> {noformat}
> the limit 100 from the outer order by can be pushed down to the analytic sort
> that is done below the AnalyticEval operator. The reason is there are
> effectively 2 limits:
> PARTITION BY l_partkey ORDER BY l_quantity LIMIT PER PARTITION 100
> ORDER BY l_partkey .... LIMIT 100
> and together they imply
> ORDER BY l_partkey, l_quantity LIMIT 100
> For the limit pushdown to work, the partition-by exprs must be a leading
> prefix of the order-by exprs. Also, other qualifying conditions must be met
> based on the above pattern.
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]