[
https://issues.apache.org/jira/browse/IMPALA-5004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16678633#comment-16678633
]
ASF subversion and git services commented on IMPALA-5004:
---------------------------------------------------------
Commit 98d923243f8bc95d66e497f4d8a15db57af32663 in impala's branch
refs/heads/master from stakiar
[ https://git-wip-us.apache.org/repos/asf?p=impala.git;h=98d9232 ]
IMPALA-5004: Switch to sorting node for large TopN queries
Adds a new query option 'topn_bytes_limit' that places a limit on the
number of estimated bytes that a TopN operator can process. If the
Impala planner estimates that a TopN operator will process more bytes
than this limit, it will replace the TopN operator with a sort operator.
Since the TopN operator cannot spill to disk, it has to buffer everything
in memory. This can cause frequent OOM issues when running with a large
limit + offset. Switching to a sort operator allows Impala to spill to
disk. We prefer to use the TopN operator when possible as it has better
performance than the sort operator for 'order by limit [offset]' queries.
The default limit is set to 512MB and is based on micro-benchmarking the
topn vs. sort operator for various limits (see the JIRA for full details).
The default is set to an intentionally high value in order to avoid
performance regressions.
Testing:
* Added a new planner test to fuctional-planner/ to validate that
'topn_bytes_limit' properly switches between topn and sort operators.
Change-Id: I34c9db33c9302b55e9978f53f9c7061f2806c8a9
Reviewed-on: http://gerrit.cloudera.org:8080/11698
Tested-by: Impala Public Jenkins <[email protected]>
Reviewed-by: Tim Armstrong <[email protected]>
> Switch to sorting node for large TopN queries
> ---------------------------------------------
>
> Key: IMPALA-5004
> URL: https://issues.apache.org/jira/browse/IMPALA-5004
> Project: IMPALA
> Issue Type: Improvement
> Components: Frontend
> Affects Versions: Impala 2.9.0
> Reporter: Lars Volker
> Assignee: Sahil Takiar
> Priority: Major
>
> As explained by [~tarmstrong] in IMPALA-4995:
> bq. We should also consider switching to the sort operator for large limits.
> This allows it to spill. The memory requirements for TopN also are
> problematic for large limits, since it would allocate large vectors that are
> untracked and also require a large amount of contiguous memory.
> There's already logic to select TopN vs. Sort:
> [planner/SingleNodePlanner.java#L289|https://github.com/apache/incubator-impala/blob/master/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java#L289]
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]