[
https://issues.apache.org/jira/browse/HIVE-17342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17598180#comment-17598180
]
Krisztian Kasa commented on HIVE-17342:
---------------------------------------
[~amansinha]
Found that in some cases CBO ends up with a plan having a {{HiveProject}} root
node instead of {{{}HiveSortLimit(fetch=[0]){}}}. This is translated to
subquery at the Hive physical level and the limit 0 optimization does not used.
By setting
{code:java}
set hive.optimize.limittranspose=true;
{code}
enables {{HiveProjectSortTransposeRule}} which can push through {{HiveProject}}
and {{HiveSortLimit(fetch=[0])}} becomes the new root.
For your query:
{code:java}
POSTHOOK: query: explain cbo
select y from (select a1 y from t1 where b1 > 10) q WHERE 1=0
CBO PLAN:
HiveSortLimit(fetch=[0])
HiveProject(y=[$0])
HiveFilter(condition=[>($1, 10)])
HiveTableScan(table=[[default, t1]], table:alias=[t1])
{code}
{code:java}
POSTHOOK: query: explain
select y from (select a1 y from t1 where b1 > 10) q WHERE 1=0
Plan optimized by CBO.
Stage-0
Fetch Operator
limit:0
{code}
> Where condition with 1=0 should be treated similar to limit 0
> -------------------------------------------------------------
>
> Key: HIVE-17342
> URL: https://issues.apache.org/jira/browse/HIVE-17342
> Project: Hive
> Issue Type: Improvement
> Reporter: Rajesh Balamohan
> Assignee: Krisztian Kasa
> Priority: Minor
>
> In some cases, queries may get executed with where condition mentioning to
> "1=0" to get schema. E.g
> {noformat}
> SELECT * FROM (select avg(d_year) as y from date_dim where d_year>1999) q
> WHERE 1=0
> {noformat}
> Currently hive executes the query; it would be good to consider this similar
> to "limit 0" which does not execute the query.
> {code}
> hive> explain SELECT * FROM (select avg(d_year) as y from date_dim where
> d_year>1999) q WHERE 1=0;
> OK
> Plan optimized by CBO.
> Vertex dependency in root stage
> Reducer 2 <- Map 1 (CUSTOM_SIMPLE_EDGE)
> Stage-0
> Fetch Operator
> limit:-1
> Stage-1
> Reducer 2 vectorized, llap
> File Output Operator [FS_13]
> Group By Operator [GBY_12] (rows=1 width=76)
> Output:["_col0"],aggregations:["avg(VALUE._col0)"]
> <-Map 1 [CUSTOM_SIMPLE_EDGE] vectorized, llap
> PARTITION_ONLY_SHUFFLE [RS_11]
> Group By Operator [GBY_10] (rows=1 width=76)
> Output:["_col0"],aggregations:["avg(d_year)"]
> Filter Operator [FIL_9] (rows=1 width=0)
> predicate:false
> TableScan [TS_0] (rows=1 width=0)
>
> default@date_dim,date_dim,Tbl:PARTIAL,Col:NONE,Output:["d_year"]
> {code}
> It does generate 0 splits, but does send a DAG plan to the AM and receive 0
> rows as output.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)