[
https://issues.apache.org/jira/browse/IMPALA-2581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17418979#comment-17418979
]
ASF subversion and git services commented on IMPALA-2581:
---------------------------------------------------------
Commit 39cc4b6bf45a172c3fdcd6a9cc42eaadfcf3ae71 in impala's branch
refs/heads/master from liuyao
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=39cc4b6 ]
IMPALA-2581: LIMIT can be propagated down into some aggregations
This patch contains 2 parts:
1. When both conditions below are true, push down limit to
pre-aggregation
a) aggregation node has no aggregate function
b) aggregation node has no predicate
2. finish aggregation when number of unique keys of hash table has
exceeded the limit.
Sample queries:
SELECT DISTINCT f FROM t LIMIT n
Can pass the LIMIT all the way down to the pre-aggregation, which
leads to a nearly unbounded speedup on these queries in large tables
when n is low.
Testing:
Add test targeted-perf/queries/aggregation.test
Pass core test
Change-Id: I930a6cb203615acfc03f23118d1bc1f0ea360995
Reviewed-on: http://gerrit.cloudera.org:8080/17821
Reviewed-by: Qifan Chen <[email protected]>
Tested-by: Impala Public Jenkins <[email protected]>
> Push down LIMIT past DISTINCT
> ------------------------------
>
> Key: IMPALA-2581
> URL: https://issues.apache.org/jira/browse/IMPALA-2581
> Project: IMPALA
> Issue Type: Improvement
> Components: Frontend
> Affects Versions: Impala 2.5.0
> Reporter: Jim Apple
> Assignee: liuyao
> Priority: Minor
> Labels: performance
>
> In a table t with a column x with no null values, "SELECT DISTINCT x FROM t
> LIMIT 1" should be roughly instant. Instead, it finds *all* the distinct
> values, then returns one of them.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]