Rajesh Balamohan created HIVE-24207: ---------------------------------------
Summary: LimitOperator can leverage ObjectCache to bail out quickly Key: HIVE-24207 URL: https://issues.apache.org/jira/browse/HIVE-24207 Project: Hive Issue Type: Improvement Reporter: Rajesh Balamohan {noformat} select ss_sold_date_sk from store_sales, date_dim where date_dim.d_year in (1998,1998+1,1998+2) and store_sales.ss_sold_date_sk = date_dim.d_date_sk limit 100; select distinct ss_sold_date_sk from store_sales, date_dim where date_dim.d_year in (1998,1998+1,1998+2) and store_sales.ss_sold_date_sk = date_dim.d_date_sk limit 100; {noformat} Queries like the above generate a large number of map tasks. Currently they don't bail out after generating enough amount of data. It would be good to make use of ObjectCache & retain the number of records generated. LimitOperator/VectorLimitOperator can bail out for the later tasks in the operator's init phase itself. https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorLimitOperator.java#L57 https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/LimitOperator.java#L58 -- This message was sent by Atlassian Jira (v8.3.4#803005)