[
https://issues.apache.org/jira/browse/HIVE-11133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15977912#comment-15977912
]
Sahil Takiar commented on HIVE-11133:
-------------------------------------
The qtest in the patch has a very similar query:
{code}
select sum(hash(a.k1,a.v1,a.k2, a.v2))
from (
select src1.key as k1, src1.value as v1,
src2.key as k2, src2.value as v2 FROM
(select * FROM src WHERE src.key < 10) src1
JOIN
(select * FROM src WHERE src.key < 10) src2
SORT BY k1, v1, k2, v2
) a
{code}
It's also a mapjoin. The user-level explain output is:
{code}
Plan not optimized by CBO.
Vertex dependency in root stage
Reducer 2 <- Map 1 (PARTITION-LEVEL SORT)
Reducer 3 <- Reducer 2 (GROUP)
Stage-0
Fetch Operator
limit:-1
Stage-1
Reducer 3
File Output Operator [FS_17]
Group By Operator [GBY_15] (rows=1 width=8)
Output:["_col0"],aggregations:["sum(VALUE._col0)"]
<-Reducer 2 [GROUP]
GROUP [RS_14]
Group By Operator [GBY_13] (rows=1 width=8)
Output:["_col0"],aggregations:["sum(hash(_col0,_col1,_col2,_col3))"]
Select Operator [SEL_11] (rows=27556 width=22)
Output:["_col0","_col1","_col2","_col3"]
<-Map 1 [PARTITION-LEVEL SORT]
PARTITION-LEVEL SORT [RS_10]
Map Join Operator [MAPJOIN_20] (rows=27556 width=22)
Conds:(Inner),Output:["_col0","_col1","_col2","_col3"]
<-Select Operator [SEL_2] (rows=166 width=10)
Output:["_col0","_col1"]
Filter Operator [FIL_18] (rows=166 width=10)
predicate:(key < 10)
TableScan [TS_0] (rows=500 width=10)
default@src,src,Tbl:COMPLETE,Col:NONE,Output:["key","value"]
Map Reduce Local Work
Stage-2
Map 4
keys: [HASHTABLESINK_22]
Select Operator [SEL_5] (rows=166 width=10)
Output:["_col0","_col1"]
Filter Operator [FIL_19] (rows=166 width=10)
predicate:(key < 10)
TableScan [TS_3] (rows=500 width=10)
default@src,src,Tbl:COMPLETE,Col:NONE,Output:["key","value"]
Map Reduce Local Work
{code}
The raw query plan looks like:
{code}
{
"STAGE DEPENDENCIES": {
"Stage-2": {
"ROOT STAGE": "TRUE"
},
"Stage-1": {
"DEPENDENT STAGES": "Stage-2"
},
"Stage-0": {
"DEPENDENT STAGES": "Stage-1"
}
},
"STAGE PLANS": {
"Stage-2": {
"Spark": {
"Vertices:": {
"Map 2": {
"Map Operator Tree:": [
{
"TableScan": {
"Output:": [
"key",
"value"
],
"_empty_": "default@myinput1,b,Tbl:COMPLETE,Col:NONE",
"Statistics:": "rows=3 width=8",
"OperatorId:": "TS_1",
"children": {
"keys:": {
"0": "key",
"1": "value",
"OperatorId:": "HASHTABLESINK_10"
}
}
}
}
],
"Local Work:": {
"Map Reduce Local Work": {
}
},
"tag:": "0"
}
}
}
},
"Stage-1": {
"Spark": {
"Vertices:": {
"Map 1": {
"Map Operator Tree:": [
{
"TableScan": {
"Output:": [
"key",
"value"
],
"_empty_": "default@myinput1,a,Tbl:COMPLETE,Col:NONE",
"Statistics:": "rows=3 width=8",
"OperatorId:": "TS_0",
"children": {
"Map Join Operator": {
"condition map:": [
{
"_empty_":
"{\"type\":\"Inner\",\"left\":0,\"right\":1}"
}
],
"input vertices:": {
"1": "Map 2"
},
"keys:": {
"0": "key",
"1": "value"
},
"Output:": [
"_col0",
"_col1",
"_col5",
"_col6"
],
"Statistics:": "rows=3 width=9",
"OperatorId:": "MAPJOIN_7",
"children": {
"Select Operator": {
"Output:": [
"_col0",
"_col1",
"_col2",
"_col3"
],
"Statistics:": "rows=3 width=9",
"OperatorId:": "SEL_8",
"children": {
"File Output Operator": {
"Statistics:": "rows=3 width=9",
"OperatorId:": "FS_6"
}
}
}
}
}
}
}
}
],
"Local Work:": {
"Map Reduce Local Work": {
}
},
"tag:": "0"
}
}
}
},
"Stage-0": {
"Fetch Operator": {
"limit:": "-1"
}
}
},
"cboInfo": "Plan not optimized by CBO due to missing feature
[Less_than_equal_greater_than]."
}
{code}
So it looks like the map -> reduce dependency is there, map-4 (the hash table
sink operator) -> reducer-2 (group by); does that sound correct?
> Support hive.explain.user for Spark
> -----------------------------------
>
> Key: HIVE-11133
> URL: https://issues.apache.org/jira/browse/HIVE-11133
> Project: Hive
> Issue Type: Sub-task
> Components: Spark
> Reporter: Mohit Sabharwal
> Assignee: Sahil Takiar
> Attachments: HIVE-11133.1.patch, HIVE-11133.2.patch,
> HIVE-11133.3.patch, HIVE-11133.4.patch, HIVE-11133.5.patch,
> HIVE-11133.6.patch, HIVE-11133.7.patch
>
>
> User friendly explain output ({{set hive.explain.user=true}}) should support
> Spark as well.
> Once supported, we should also enable related q-tests like {{explainuser_1.q}}
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)