[
https://issues.apache.org/jira/browse/HIVE-16507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sahil Takiar updated HIVE-16507:
--------------------------------
Status: Patch Available (was: Open)
> Hive Explain User-Level may print out "Vertex dependency in root stage" twice
> -----------------------------------------------------------------------------
>
> Key: HIVE-16507
> URL: https://issues.apache.org/jira/browse/HIVE-16507
> Project: Hive
> Issue Type: Bug
> Reporter: Sahil Takiar
> Assignee: Sahil Takiar
> Attachments: HIVE-16507.1.patch
>
>
> User-level explain plans have a section titled {{Vertex dependency in root
> stage}} - which (according to the name) prints out the dependencies between
> all vertices that are in the root stage.
> This logic is controlled by {{DagJsonParser#print}} and it may print out
> {{Vertex dependency in root stage}} twice.
> The logic in this method first extracts all stages and plans. It then
> iterates over all the stages, and if the stage contains any edges, it prints
> them out.
> If we want to be consistent with the statement {{Vertex dependency in root
> stage}} then we should add a check to see if the stage we are processing
> during the iteration is the root stage or not.
> Alternatively, we could print out the edges for each stage and change the
> line from {{Vertex dependency in root stage}} to {{Vertex dependency in
> [stage-id]}}
> I'm not sure if its possible for Hive-on-Tez to create a plan with a non-root
> stage that contains edges, but it is possible for Hive-on-Spark (support
> added for HoS in HIVE-11133).
> Example for HoS:
> {code}
> set hive.optimize.ppd=true;
> set hive.ppd.remove.duplicatefilters=true;
> set hive.spark.dynamic.partition.pruning=true;
> set hive.optimize.metadataonly=false;
> set hive.optimize.index.filter=true;
> set hive.strict.checks.cartesian.product=false;
> set hive.spark.explain.user=true;
> set hive.spark.dynamic.partition.pruning=true;
> EXPLAIN select count(*) from srcpart where srcpart.ds in (select
> max(srcpart.ds) from srcpart union all select min(srcpart.ds) from srcpart);
> {code}
> Prints
> {code}
> Plan optimized by CBO.
> Vertex dependency in root stage
> Reducer 10 <- Map 9 (GROUP)
> Reducer 11 <- Reducer 10 (GROUP), Reducer 13 (GROUP)
> Reducer 13 <- Map 12 (GROUP)
> Vertex dependency in root stage
> Reducer 2 <- Map 1 (PARTITION-LEVEL SORT), Reducer 6 (PARTITION-LEVEL SORT)
> Reducer 3 <- Reducer 2 (GROUP)
> Reducer 5 <- Map 4 (GROUP)
> Reducer 6 <- Reducer 5 (GROUP), Reducer 8 (GROUP)
> Reducer 8 <- Map 7 (GROUP)
> Stage-0
> Fetch Operator
> limit:-1
> Stage-1
> Reducer 3
> File Output Operator [FS_34]
> Group By Operator [GBY_32] (rows=1 width=8)
> Output:["_col0"],aggregations:["count(VALUE._col0)"]
> <-Reducer 2 [GROUP]
> GROUP [RS_31]
> Group By Operator [GBY_30] (rows=1 width=8)
> Output:["_col0"],aggregations:["count()"]
> Join Operator [JOIN_28] (rows=2200 width=10)
> condition
> map:[{"":"{\"type\":\"Inner\",\"left\":0,\"right\":1}"}],keys:{"0":"_col0","1":"_col0"}
> <-Map 1 [PARTITION-LEVEL SORT]
> PARTITION-LEVEL SORT [RS_26]
> PartitionCols:_col0
> Select Operator [SEL_2] (rows=2000 width=10)
> Output:["_col0"]
> TableScan [TS_0] (rows=2000 width=10)
> default@srcpart,srcpart,Tbl:COMPLETE,Col:NONE
> <-Reducer 6 [PARTITION-LEVEL SORT]
> PARTITION-LEVEL SORT [RS_27]
> PartitionCols:_col0
> Group By Operator [GBY_24] (rows=1 width=184)
> Output:["_col0"],keys:KEY._col0
> <-Reducer 5 [GROUP]
> GROUP [RS_23]
> PartitionCols:_col0
> Group By Operator [GBY_22] (rows=2 width=184)
> Output:["_col0"],keys:_col0
> Filter Operator [FIL_9] (rows=1 width=184)
> predicate:_col0 is not null
> Group By Operator [GBY_7] (rows=1 width=184)
> Output:["_col0"],aggregations:["max(VALUE._col0)"]
> <-Map 4 [GROUP]
> GROUP [RS_6]
> Group By Operator [GBY_5] (rows=1 width=184)
> Output:["_col0"],aggregations:["max(ds)"]
> Select Operator [SEL_4] (rows=2000 width=10)
> Output:["ds"]
> TableScan [TS_3] (rows=2000 width=10)
>
> default@srcpart,srcpart,Tbl:COMPLETE,Col:NONE
> <-Reducer 8 [GROUP]
> GROUP [RS_23]
> PartitionCols:_col0
> Group By Operator [GBY_22] (rows=2 width=184)
> Output:["_col0"],keys:_col0
> Filter Operator [FIL_17] (rows=1 width=184)
> predicate:_col0 is not null
> Group By Operator [GBY_15] (rows=1 width=184)
> Output:["_col0"],aggregations:["min(VALUE._col0)"]
> <-Map 7 [GROUP]
> GROUP [RS_14]
> Group By Operator [GBY_13] (rows=1 width=184)
> Output:["_col0"],aggregations:["min(ds)"]
> Select Operator [SEL_12] (rows=2000 width=10)
> Output:["ds"]
> TableScan [TS_11] (rows=2000 width=10)
>
> default@srcpart,srcpart,Tbl:COMPLETE,Col:NONE
> Stage-2
> Reducer 11
> {code}
> So there are two sections that say {{Vertex dependency in root stage}}.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)