[
https://issues.apache.org/jira/browse/HIVE-24207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
ASF GitHub Bot updated HIVE-24207:
----------------------------------
Labels: pull-request-available (was: )
> LimitOperator can leverage ObjectCache to bail out quickly
> ----------------------------------------------------------
>
> Key: HIVE-24207
> URL: https://issues.apache.org/jira/browse/HIVE-24207
> Project: Hive
> Issue Type: Improvement
> Reporter: Rajesh Balamohan
> Assignee: László Bodor
> Priority: Major
> Labels: pull-request-available
> Time Spent: 10m
> Remaining Estimate: 0h
>
> {noformat}
> select ss_sold_date_sk from store_sales, date_dim where date_dim.d_year in
> (1998,1998+1,1998+2) and store_sales.ss_sold_date_sk = date_dim.d_date_sk
> limit 100;
> select distinct ss_sold_date_sk from store_sales, date_dim where
> date_dim.d_year in (1998,1998+1,1998+2) and store_sales.ss_sold_date_sk =
> date_dim.d_date_sk limit 100;
> {noformat}
> Queries like the above generate a large number of map tasks. Currently they
> don't bail out after generating enough amount of data.
> It would be good to make use of ObjectCache & retain the number of records
> generated. LimitOperator/VectorLimitOperator can bail out for the later tasks
> in the operator's init phase itself.
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorLimitOperator.java#L57
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/LimitOperator.java#L58
--
This message was sent by Atlassian Jira
(v8.3.4#803005)