[
https://issues.apache.org/jira/browse/HIVE-24207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17848432#comment-17848432
]
Sungwoo Park commented on HIVE-24207:
-------------------------------------
[~abstractdog] Hi, I have a couple of questions of this optimization.
1. An operator tree can contain multiple LimitOperators in general. It seems
that this optimization works only if LimitOperator has a single child operator
which should be either RS or TerminalOperator. In other words, a vertex should
contain a single LimitOperator at most and it should be the last operator
before emitting final records. Do you know if this property guaranteed by the
Hive compiler?
2. This optimization may not work if speculative execution is enabled or
multiple taskattempts are executed in the same LLAP daemon. Or, does this
optimization assume no speculative execution?
> LimitOperator can leverage ObjectCache to bail out quickly
> ----------------------------------------------------------
>
> Key: HIVE-24207
> URL: https://issues.apache.org/jira/browse/HIVE-24207
> Project: Hive
> Issue Type: Improvement
> Reporter: Rajesh Balamohan
> Assignee: László Bodor
> Priority: Major
> Labels: pull-request-available
> Fix For: 4.0.0-alpha-1
>
> Time Spent: 1.5h
> Remaining Estimate: 0h
>
> {noformat}
> select ss_sold_date_sk from store_sales, date_dim where date_dim.d_year in
> (1998,1998+1,1998+2) and store_sales.ss_sold_date_sk = date_dim.d_date_sk
> limit 100;
> select distinct ss_sold_date_sk from store_sales, date_dim where
> date_dim.d_year in (1998,1998+1,1998+2) and store_sales.ss_sold_date_sk =
> date_dim.d_date_sk limit 100;
> {noformat}
> Queries like the above generate a large number of map tasks. Currently they
> don't bail out after generating enough amount of data.
> It would be good to make use of ObjectCache & retain the number of records
> generated. LimitOperator/VectorLimitOperator can bail out for the later tasks
> in the operator's init phase itself.
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorLimitOperator.java#L57
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/LimitOperator.java#L58
--
This message was sent by Atlassian Jira
(v8.20.10#820010)