[
https://issues.apache.org/jira/browse/HIVE-24207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17851965#comment-17851965
]
Sungwoo Park commented on HIVE-24207:
-------------------------------------
Seonggon created HIVE-28281 to report the problem in case 1.
For case 2, it's hard to reproduce the problem, but the bug seems obvious
because two speculative task attempts are not supposed to update a common
counter for the same LimitOperator.
> LimitOperator can leverage ObjectCache to bail out quickly
> ----------------------------------------------------------
>
> Key: HIVE-24207
> URL: https://issues.apache.org/jira/browse/HIVE-24207
> Project: Hive
> Issue Type: Improvement
> Reporter: Rajesh Balamohan
> Assignee: László Bodor
> Priority: Major
> Labels: pull-request-available
> Fix For: 4.0.0-alpha-1
>
> Time Spent: 1.5h
> Remaining Estimate: 0h
>
> {noformat}
> select ss_sold_date_sk from store_sales, date_dim where date_dim.d_year in
> (1998,1998+1,1998+2) and store_sales.ss_sold_date_sk = date_dim.d_date_sk
> limit 100;
> select distinct ss_sold_date_sk from store_sales, date_dim where
> date_dim.d_year in (1998,1998+1,1998+2) and store_sales.ss_sold_date_sk =
> date_dim.d_date_sk limit 100;
> {noformat}
> Queries like the above generate a large number of map tasks. Currently they
> don't bail out after generating enough amount of data.
> It would be good to make use of ObjectCache & retain the number of records
> generated. LimitOperator/VectorLimitOperator can bail out for the later tasks
> in the operator's init phase itself.
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorLimitOperator.java#L57
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/LimitOperator.java#L58
--
This message was sent by Atlassian Jira
(v8.20.10#820010)