[
https://issues.apache.org/jira/browse/HIVE-11133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15979385#comment-15979385
]
Sahil Takiar edited comment on HIVE-11133 at 4/21/17 10:08 PM:
---------------------------------------------------------------
[~xuefuz]
The query:
{code}
set hive.optimize.ppd=true;
set hive.ppd.remove.duplicatefilters=true;
set hive.spark.dynamic.partition.pruning=true;
set hive.optimize.metadataonly=false;
set hive.optimize.index.filter=true;
set hive.strict.checks.cartesian.product=false;
set hive.spark.explain.user=true;
set hive.spark.dynamic.partition.pruning=true;
EXPLAIN select count(*) from srcpart where srcpart.ds in (select
max(srcpart.ds) from srcpart union all select min(srcpart.ds) from srcpart);
{code}
Prints
{code}
Plan optimized by CBO.
Vertex dependency in root stage
Reducer 10 <- Map 9 (GROUP)
Reducer 11 <- Reducer 10 (GROUP), Reducer 13 (GROUP)
Reducer 13 <- Map 12 (GROUP)
Vertex dependency in root stage
Reducer 2 <- Map 1 (PARTITION-LEVEL SORT), Reducer 6 (PARTITION-LEVEL SORT)
Reducer 3 <- Reducer 2 (GROUP)
Reducer 5 <- Map 4 (GROUP)
Reducer 6 <- Reducer 5 (GROUP), Reducer 8 (GROUP)
Reducer 8 <- Map 7 (GROUP)
Stage-0
Fetch Operator
limit:-1
Stage-1
Reducer 3
File Output Operator [FS_34]
Group By Operator [GBY_32] (rows=1 width=8)
Output:["_col0"],aggregations:["count(VALUE._col0)"]
<-Reducer 2 [GROUP]
GROUP [RS_31]
Group By Operator [GBY_30] (rows=1 width=8)
Output:["_col0"],aggregations:["count()"]
Join Operator [JOIN_28] (rows=2200 width=10)
condition
map:[{"":"{\"type\":\"Inner\",\"left\":0,\"right\":1}"}],keys:{"0":"_col0","1":"_col0"}
<-Map 1 [PARTITION-LEVEL SORT]
PARTITION-LEVEL SORT [RS_26]
PartitionCols:_col0
Select Operator [SEL_2] (rows=2000 width=10)
Output:["_col0"]
TableScan [TS_0] (rows=2000 width=10)
default@srcpart,srcpart,Tbl:COMPLETE,Col:NONE
<-Reducer 6 [PARTITION-LEVEL SORT]
PARTITION-LEVEL SORT [RS_27]
PartitionCols:_col0
Group By Operator [GBY_24] (rows=1 width=184)
Output:["_col0"],keys:KEY._col0
<-Reducer 5 [GROUP]
GROUP [RS_23]
PartitionCols:_col0
Group By Operator [GBY_22] (rows=2 width=184)
Output:["_col0"],keys:_col0
Filter Operator [FIL_9] (rows=1 width=184)
predicate:_col0 is not null
Group By Operator [GBY_7] (rows=1 width=184)
Output:["_col0"],aggregations:["max(VALUE._col0)"]
<-Map 4 [GROUP]
GROUP [RS_6]
Group By Operator [GBY_5] (rows=1 width=184)
Output:["_col0"],aggregations:["max(ds)"]
Select Operator [SEL_4] (rows=2000 width=10)
Output:["ds"]
TableScan [TS_3] (rows=2000 width=10)
default@srcpart,srcpart,Tbl:COMPLETE,Col:NONE
<-Reducer 8 [GROUP]
GROUP [RS_23]
PartitionCols:_col0
Group By Operator [GBY_22] (rows=2 width=184)
Output:["_col0"],keys:_col0
Filter Operator [FIL_17] (rows=1 width=184)
predicate:_col0 is not null
Group By Operator [GBY_15] (rows=1 width=184)
Output:["_col0"],aggregations:["min(VALUE._col0)"]
<-Map 7 [GROUP]
GROUP [RS_14]
Group By Operator [GBY_13] (rows=1 width=184)
Output:["_col0"],aggregations:["min(ds)"]
Select Operator [SEL_12] (rows=2000 width=10)
Output:["ds"]
TableScan [TS_11] (rows=2000 width=10)
default@srcpart,srcpart,Tbl:COMPLETE,Col:NONE
Stage-2
Reducer 11
{code}
So there are two sections that say {{Vertex dependency in root stage}}. I
haven't checked to see if this is possible with Hive-on-Tez, but it looks like
an existing bug in the user-level explain code.
was (Author: stakiar):
[~xuefuz]
The query:
{code}
set hive.optimize.ppd=true;
set hive.ppd.remove.duplicatefilters=true;
set hive.spark.dynamic.partition.pruning=true;
set hive.optimize.metadataonly=false;
set hive.optimize.index.filter=true;
set hive.strict.checks.cartesian.product=false;
set hive.spark.explain.user=true;
set hive.spark.dynamic.partition.pruning=true;
EXPLAIN select count(*) from srcpart where srcpart.ds in (select
max(srcpart.ds) from srcpart union all select min(srcpart.ds) from srcpart);
{code}
Prints
{code}
Plan optimized by CBO.
Vertex dependency in root stage
Reducer 10 <- Map 9 (GROUP)
Reducer 11 <- Reducer 10 (GROUP), Reducer 13 (GROUP)
Reducer 13 <- Map 12 (GROUP)
Vertex dependency in root stage
Reducer 2 <- Map 1 (PARTITION-LEVEL SORT), Reducer 6 (PARTITION-LEVEL SORT)
Reducer 3 <- Reducer 2 (GROUP)
Reducer 5 <- Map 4 (GROUP)
Reducer 6 <- Reducer 5 (GROUP), Reducer 8 (GROUP)
Reducer 8 <- Map 7 (GROUP)
Stage-0
Fetch Operator
limit:-1
Stage-1
Reducer 3
File Output Operator [FS_34]
Group By Operator [GBY_32] (rows=1 width=8)
Output:["_col0"],aggregations:["count(VALUE._col0)"]
<-Reducer 2 [GROUP]
GROUP [RS_31]
Group By Operator [GBY_30] (rows=1 width=8)
Output:["_col0"],aggregations:["count()"]
Join Operator [JOIN_28] (rows=2200 width=10)
condition
map:[{"":"{\"type\":\"Inner\",\"left\":0,\"right\":1}"}],keys:{"0":"_col0","1":"_col0"}
<-Map 1 [PARTITION-LEVEL SORT]
PARTITION-LEVEL SORT [RS_26]
PartitionCols:_col0
Select Operator [SEL_2] (rows=2000 width=10)
Output:["_col0"]
TableScan [TS_0] (rows=2000 width=10)
default@srcpart,srcpart,Tbl:COMPLETE,Col:NONE
<-Reducer 6 [PARTITION-LEVEL SORT]
PARTITION-LEVEL SORT [RS_27]
PartitionCols:_col0
Group By Operator [GBY_24] (rows=1 width=184)
Output:["_col0"],keys:KEY._col0
<-Reducer 5 [GROUP]
GROUP [RS_23]
PartitionCols:_col0
Group By Operator [GBY_22] (rows=2 width=184)
Output:["_col0"],keys:_col0
Filter Operator [FIL_9] (rows=1 width=184)
predicate:_col0 is not null
Group By Operator [GBY_7] (rows=1 width=184)
Output:["_col0"],aggregations:["max(VALUE._col0)"]
<-Map 4 [GROUP]
GROUP [RS_6]
Group By Operator [GBY_5] (rows=1 width=184)
Output:["_col0"],aggregations:["max(ds)"]
Select Operator [SEL_4] (rows=2000 width=10)
Output:["ds"]
TableScan [TS_3] (rows=2000 width=10)
default@srcpart,srcpart,Tbl:COMPLETE,Col:NONE
<-Reducer 8 [GROUP]
GROUP [RS_23]
PartitionCols:_col0
Group By Operator [GBY_22] (rows=2 width=184)
Output:["_col0"],keys:_col0
Filter Operator [FIL_17] (rows=1 width=184)
predicate:_col0 is not null
Group By Operator [GBY_15] (rows=1 width=184)
Output:["_col0"],aggregations:["min(VALUE._col0)"]
<-Map 7 [GROUP]
GROUP [RS_14]
Group By Operator [GBY_13] (rows=1 width=184)
Output:["_col0"],aggregations:["min(ds)"]
Select Operator [SEL_12] (rows=2000 width=10)
Output:["ds"]
TableScan [TS_11] (rows=2000 width=10)
default@srcpart,srcpart,Tbl:COMPLETE,Col:NONE
Stage-2
Reducer 11
{code}
So there are two sections that say {{Vertex dependency in root stage}}. I
haven't checked to see if this is possible with Hive-on-Tez, but it looks like
an existing bug in the user-level explain code.
> Support hive.explain.user for Spark
> -----------------------------------
>
> Key: HIVE-11133
> URL: https://issues.apache.org/jira/browse/HIVE-11133
> Project: Hive
> Issue Type: Sub-task
> Components: Spark
> Reporter: Mohit Sabharwal
> Assignee: Sahil Takiar
> Attachments: HIVE-11133.1.patch, HIVE-11133.2.patch,
> HIVE-11133.3.patch, HIVE-11133.4.patch, HIVE-11133.5.patch,
> HIVE-11133.6.patch, HIVE-11133.7.patch, HIVE-11133.8.patch
>
>
> User friendly explain output ({{set hive.explain.user=true}}) should support
> Spark as well.
> Once supported, we should also enable related q-tests like {{explainuser_1.q}}
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)