[
https://issues.apache.org/jira/browse/HIVE-17342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17597866#comment-17597866
]
Krisztian Kasa commented on HIVE-17342:
---------------------------------------
This seems to be fixed on the master branch:
The plan of the query mentioned in the description:
{code:java}
POSTHOOK: query: explain
SELECT * FROM (select avg(d_year) as y from date_dim where d_year>1999) q
WHERE 1=0
POSTHOOK: type: QUERY
POSTHOOK: Input: default@date_dim
#### A masked pattern was here ####
Plan optimized by CBO.
Stage-0
Fetch Operator
limit:0
{code}
It seems that Calcite has the functionality to transform Filter operators with
always false predicate to a relational expression which returns empty result
set. (CALCITE-1058) To exploit it Hive introduced it's implementation of this
expression (HIVE-13316):
[https://github.com/apache/hive/blob/64b731820cf462d3c03632fb4e99277ee539dd08/ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/HiveRelBuilder.java#L133]
CBO plan is
{code:java}
POSTHOOK: query: explain cbo
SELECT * FROM (select avg(d_year) as y from date_dim where d_year>1999) q
WHERE 1=0
POSTHOOK: type: QUERY
POSTHOOK: Input: default@date_dim
#### A masked pattern was here ####
CBO PLAN:
HiveSortLimit(fetch=[0])
HiveProject($f0=[/(CAST($0):DOUBLE, $1)])
HiveAggregate(group=[{}], agg#0=[sum($0)], agg#1=[count($0)])
HiveFilter(condition=[>($0, 1999)])
HiveTableScan(table=[[default, date_dim]], table:alias=[date_dim])
{code}
Since the {{HiveSortLimit(fetch=[0])}} is the root operator no execution is
needed (HIVE-7203):
[https://github.com/apache/hive/blob/64b731820cf462d3c03632fb4e99277ee539dd08/ql/src/java/org/apache/hadoop/hive/ql/parse/TaskCompiler.java#L264]
> Where condition with 1=0 should be treated similar to limit 0
> -------------------------------------------------------------
>
> Key: HIVE-17342
> URL: https://issues.apache.org/jira/browse/HIVE-17342
> Project: Hive
> Issue Type: Improvement
> Reporter: Rajesh Balamohan
> Priority: Minor
>
> In some cases, queries may get executed with where condition mentioning to
> "1=0" to get schema. E.g
> {noformat}
> SELECT * FROM (select avg(d_year) as y from date_dim where d_year>1999) q
> WHERE 1=0
> {noformat}
> Currently hive executes the query; it would be good to consider this similar
> to "limit 0" which does not execute the query.
> {code}
> hive> explain SELECT * FROM (select avg(d_year) as y from date_dim where
> d_year>1999) q WHERE 1=0;
> OK
> Plan optimized by CBO.
> Vertex dependency in root stage
> Reducer 2 <- Map 1 (CUSTOM_SIMPLE_EDGE)
> Stage-0
> Fetch Operator
> limit:-1
> Stage-1
> Reducer 2 vectorized, llap
> File Output Operator [FS_13]
> Group By Operator [GBY_12] (rows=1 width=76)
> Output:["_col0"],aggregations:["avg(VALUE._col0)"]
> <-Map 1 [CUSTOM_SIMPLE_EDGE] vectorized, llap
> PARTITION_ONLY_SHUFFLE [RS_11]
> Group By Operator [GBY_10] (rows=1 width=76)
> Output:["_col0"],aggregations:["avg(d_year)"]
> Filter Operator [FIL_9] (rows=1 width=0)
> predicate:false
> TableScan [TS_0] (rows=1 width=0)
>
> default@date_dim,date_dim,Tbl:PARTIAL,Col:NONE,Output:["d_year"]
> {code}
> It does generate 0 splits, but does send a DAG plan to the AM and receive 0
> rows as output.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)