[
https://issues.apache.org/jira/browse/HIVE-17486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16276526#comment-16276526
]
liyunzhang commented on HIVE-17486:
-----------------------------------
explain.28.scan.share.true
we can see that there is only operator (TS) in Map1, and the child of TS to the
RS are belongs to another Map(Map12,Map15,Map18,Map2,Map6,Map9). So change
current {{M-R}} in 1 SparkTask to {{M-M-R}}
{code}
STAGE DEPENDENCIES:
Stage-1 is a root stage
Stage-0 depends on stages: Stage-1
STAGE PLANS:
Stage: Stage-1
Spark
Edges:
Map 12 <- Map 1 (NONE, 1000)
Map 15 <- Map 1 (NONE, 1000)
Map 18 <- Map 1 (NONE, 1000)
Map 2 <- Map 1 (NONE, 1000)
Map 6 <- Map 1 (NONE, 1000)
Map 9 <- Map 1 (NONE, 1000)
Reducer 10 <- Map 9 (GROUP PARTITION-LEVEL SORT, 1)
Reducer 13 <- Map 12 (GROUP PARTITION-LEVEL SORT, 1)
Reducer 16 <- Map 15 (GROUP PARTITION-LEVEL SORT, 1)
Reducer 19 <- Map 18 (GROUP PARTITION-LEVEL SORT, 1)
Reducer 3 <- Map 2 (GROUP PARTITION-LEVEL SORT, 1)
Reducer 4 <- Reducer 10 (PARTITION-LEVEL SORT, 1), Reducer 13
(PARTITION-LEVEL SORT, 1), Reducer 16 (PARTITION-LEVEL SORT, 1), Reducer 19
(PARTITION-LEVEL SORT, 1), Reducer 3 (PARTITION-LEVEL SORT, 1), Reducer 7
(PARTITION-LEVEL SORT, 1)
Reducer 7 <- Map 6 (GROUP PARTITION-LEVEL SORT, 1)
DagName: root_20171204042631_0435ff7e-3f10-4c84-a5fc-dc5b607497ba:1
Vertices:
Map 1
Map Operator Tree:
TableScan
alias: store_sales
filterExpr: ((ss_quantity BETWEEN 0 AND 5 and (ss_list_price
BETWEEN 11 AND 21 or ss_coupon_amt BETWEEN 460 AND 1460 or ss_wholesale_cost
BETWEEN 14 AND 34)) or (ss_quantity BETWEEN 6 AND 10 and (ss_list_price BETWEEN
91 AND 101 or ss_coupon_amt BETWEEN 1430 AND 2430 or ss_wholesale_cost BETWEEN
32 AND 52)) or (ss_quantity BETWEEN 11 AND 15 and (ss_list_price BETWEEN 66 AND
76 or ss_coupon_amt BETWEEN 920 AND 1920 or ss_wholesale_cost BETWEEN 4 AND
24)) or (ss_quantity BETWEEN 16 AND 20 and (ss_list_price BETWEEN 142 AND 152
or ss_coupon_amt BETWEEN 3054 AND 4054 or ss_wholesale_cost BETWEEN 80 AND
100)) or (ss_quantity BETWEEN 21 AND 25 and (ss_list_price BETWEEN 135 AND 145
or ss_coupon_amt BETWEEN 14180 AND 15180 or ss_wholesale_cost BETWEEN 38 AND
58)) or (ss_quantity BETWEEN 26 AND 30 and (ss_list_price BETWEEN 28 AND 38 or
ss_coupon_amt BETWEEN 2513 AND 3513 or ss_wholesale_cost BETWEEN 42 AND 62)))
(type: boolean)
Statistics: Num rows: 28800991 Data size: 4751513940 Basic
stats: COMPLETE Column stats: NONE
Execution mode: vectorized
Map 12
Map Operator Tree:
Filter Operator
predicate: (ss_quantity BETWEEN 16 AND 20 and (ss_list_price
BETWEEN 142 AND 152 or ss_coupon_amt BETWEEN 3054 AND 4054 or ss_wholesale_cost
BETWEEN 80 AND 100)) (type: boolean)
Statistics: Num rows: 1066701 Data size: 175981606 Basic
stats: COMPLETE Column stats: NONE
Select Operator
expressions: ss_list_price (type: double)
outputColumnNames: ss_list_price
Statistics: Num rows: 1066701 Data size: 175981606 Basic
stats: COMPLETE Column stats: NONE
Group By Operator
aggregations: avg(ss_list_price), count(ss_list_price),
count(DISTINCT ss_list_price)
keys: ss_list_price (type: double)
mode: hash
outputColumnNames: _col0, _col1, _col2, _col3
Statistics: Num rows: 1066701 Data size: 175981606 Basic
stats: COMPLETE Column stats: NONE
Reduce Output Operator
key expressions: _col0 (type: double)
sort order: +
Statistics: Num rows: 1066701 Data size: 175981606
Basic stats: COMPLETE Column stats: NONE
value expressions: _col1 (type:
struct<count:bigint,sum:double,input:double>), _col2 (type: bigint)
Map 15
Map Operator Tree:
Filter Operator
predicate: (ss_quantity BETWEEN 21 AND 25 and (ss_list_price
BETWEEN 135 AND 145 or ss_coupon_amt BETWEEN 14180 AND 15180 or
ss_wholesale_cost BETWEEN 38 AND 58)) (type: boolean)
Statistics: Num rows: 1066701 Data size: 175981606 Basic
stats: COMPLETE Column stats: NONE
Select Operator
expressions: ss_list_price (type: double)
outputColumnNames: ss_list_price
Statistics: Num rows: 1066701 Data size: 175981606 Basic
stats: COMPLETE Column stats: NONE
Group By Operator
aggregations: avg(ss_list_price), count(ss_list_price),
count(DISTINCT ss_list_price)
keys: ss_list_price (type: double)
mode: hash
outputColumnNames: _col0, _col1, _col2, _col3
Statistics: Num rows: 1066701 Data size: 175981606 Basic
stats: COMPLETE Column stats: NONE
Reduce Output Operator
key expressions: _col0 (type: double)
sort order: +
Statistics: Num rows: 1066701 Data size: 175981606
Basic stats: COMPLETE Column stats: NONE
value expressions: _col1 (type:
struct<count:bigint,sum:double,input:double>), _col2 (type: bigint)
Map 18
Map Operator Tree:
Filter Operator
predicate: (ss_quantity BETWEEN 26 AND 30 and (ss_list_price
BETWEEN 28 AND 38 or ss_coupon_amt BETWEEN 2513 AND 3513 or ss_wholesale_cost
BETWEEN 42 AND 62)) (type: boolean)
Statistics: Num rows: 1066701 Data size: 175981606 Basic
stats: COMPLETE Column stats: NONE
Select Operator
expressions: ss_list_price (type: double)
outputColumnNames: ss_list_price
Statistics: Num rows: 1066701 Data size: 175981606 Basic
stats: COMPLETE Column stats: NONE
Group By Operator
aggregations: avg(ss_list_price), count(ss_list_price),
count(DISTINCT ss_list_price)
keys: ss_list_price (type: double)
mode: hash
outputColumnNames: _col0, _col1, _col2, _col3
Statistics: Num rows: 1066701 Data size: 175981606 Basic
stats: COMPLETE Column stats: NONE
Reduce Output Operator
key expressions: _col0 (type: double)
sort order: +
Statistics: Num rows: 1066701 Data size: 175981606
Basic stats: COMPLETE Column stats: NONE
value expressions: _col1 (type:
struct<count:bigint,sum:double,input:double>), _col2 (type: bigint)
Map 2
Map Operator Tree:
Filter Operator
predicate: (ss_quantity BETWEEN 0 AND 5 and (ss_list_price
BETWEEN 11 AND 21 or ss_coupon_amt BETWEEN 460 AND 1460 or ss_wholesale_cost
BETWEEN 14 AND 34)) (type: boolean)
Statistics: Num rows: 1066701 Data size: 175981606 Basic
stats: COMPLETE Column stats: NONE
Select Operator
expressions: ss_list_price (type: double)
outputColumnNames: ss_list_price
Statistics: Num rows: 1066701 Data size: 175981606 Basic
stats: COMPLETE Column stats: NONE
Group By Operator
aggregations: avg(ss_list_price), count(ss_list_price),
count(DISTINCT ss_list_price)
keys: ss_list_price (type: double)
mode: hash
outputColumnNames: _col0, _col1, _col2, _col3
Statistics: Num rows: 1066701 Data size: 175981606 Basic
stats: COMPLETE Column stats: NONE
Reduce Output Operator
key expressions: _col0 (type: double)
sort order: +
Statistics: Num rows: 1066701 Data size: 175981606
Basic stats: COMPLETE Column stats: NONE
value expressions: _col1 (type:
struct<count:bigint,sum:double,input:double>), _col2 (type: bigint)
Map 6
Map Operator Tree:
Filter Operator
predicate: (ss_quantity BETWEEN 6 AND 10 and (ss_list_price
BETWEEN 91 AND 101 or ss_coupon_amt BETWEEN 1430 AND 2430 or ss_wholesale_cost
BETWEEN 32 AND 52)) (type: boolean)
Statistics: Num rows: 1066701 Data size: 175981606 Basic
stats: COMPLETE Column stats: NONE
Select Operator
expressions: ss_list_price (type: double)
outputColumnNames: ss_list_price
Statistics: Num rows: 1066701 Data size: 175981606 Basic
stats: COMPLETE Column stats: NONE
Group By Operator
aggregations: avg(ss_list_price), count(ss_list_price),
count(DISTINCT ss_list_price)
keys: ss_list_price (type: double)
mode: hash
outputColumnNames: _col0, _col1, _col2, _col3
Statistics: Num rows: 1066701 Data size: 175981606 Basic
stats: COMPLETE Column stats: NONE
Reduce Output Operator
key expressions: _col0 (type: double)
sort order: +
Statistics: Num rows: 1066701 Data size: 175981606
Basic stats: COMPLETE Column stats: NONE
value expressions: _col1 (type:
struct<count:bigint,sum:double,input:double>), _col2 (type: bigint)
Map 9
Map Operator Tree:
Filter Operator
predicate: (ss_quantity BETWEEN 11 AND 15 and (ss_list_price
BETWEEN 66 AND 76 or ss_coupon_amt BETWEEN 920 AND 1920 or ss_wholesale_cost
BETWEEN 4 AND 24)) (type: boolean)
Statistics: Num rows: 1066701 Data size: 175981606 Basic
stats: COMPLETE Column stats: NONE
Select Operator
expressions: ss_list_price (type: double)
outputColumnNames: ss_list_price
Statistics: Num rows: 1066701 Data size: 175981606 Basic
stats: COMPLETE Column stats: NONE
Group By Operator
aggregations: avg(ss_list_price), count(ss_list_price),
count(DISTINCT ss_list_price)
keys: ss_list_price (type: double)
mode: hash
outputColumnNames: _col0, _col1, _col2, _col3
Statistics: Num rows: 1066701 Data size: 175981606 Basic
stats: COMPLETE Column stats: NONE
Reduce Output Operator
key expressions: _col0 (type: double)
sort order: +
Statistics: Num rows: 1066701 Data size: 175981606
Basic stats: COMPLETE Column stats: NONE
value expressions: _col1 (type:
struct<count:bigint,sum:double,input:double>), _col2 (type: bigint)
Reducer 10
Reduce Operator Tree:
Group By Operator
aggregations: avg(VALUE._col0), count(VALUE._col1),
count(DISTINCT KEY._col0:0._col0)
mode: mergepartial
outputColumnNames: _col0, _col1, _col2
Statistics: Num rows: 1 Data size: 104 Basic stats: COMPLETE
Column stats: NONE
Reduce Output Operator
sort order:
Statistics: Num rows: 1 Data size: 104 Basic stats: COMPLETE
Column stats: NONE
value expressions: _col0 (type: double), _col1 (type:
bigint), _col2 (type: bigint)
Reducer 13
Reduce Operator Tree:
Group By Operator
aggregations: avg(VALUE._col0), count(VALUE._col1),
count(DISTINCT KEY._col0:0._col0)
mode: mergepartial
outputColumnNames: _col0, _col1, _col2
Statistics: Num rows: 1 Data size: 104 Basic stats: COMPLETE
Column stats: NONE
Reduce Output Operator
sort order:
Statistics: Num rows: 1 Data size: 104 Basic stats: COMPLETE
Column stats: NONE
value expressions: _col0 (type: double), _col1 (type:
bigint), _col2 (type: bigint)
Reducer 16
Reduce Operator Tree:
Group By Operator
aggregations: avg(VALUE._col0), count(VALUE._col1),
count(DISTINCT KEY._col0:0._col0)
mode: mergepartial
outputColumnNames: _col0, _col1, _col2
Statistics: Num rows: 1 Data size: 104 Basic stats: COMPLETE
Column stats: NONE
Reduce Output Operator
sort order:
Statistics: Num rows: 1 Data size: 104 Basic stats: COMPLETE
Column stats: NONE
value expressions: _col0 (type: double), _col1 (type:
bigint), _col2 (type: bigint)
Reducer 19
Reduce Operator Tree:
Group By Operator
aggregations: avg(VALUE._col0), count(VALUE._col1),
count(DISTINCT KEY._col0:0._col0)
mode: mergepartial
outputColumnNames: _col0, _col1, _col2
Statistics: Num rows: 1 Data size: 104 Basic stats: COMPLETE
Column stats: NONE
Reduce Output Operator
sort order:
Statistics: Num rows: 1 Data size: 104 Basic stats: COMPLETE
Column stats: NONE
value expressions: _col0 (type: double), _col1 (type:
bigint), _col2 (type: bigint)
Reducer 3
Reduce Operator Tree:
Group By Operator
aggregations: avg(VALUE._col0), count(VALUE._col1),
count(DISTINCT KEY._col0:0._col0)
mode: mergepartial
outputColumnNames: _col0, _col1, _col2
Statistics: Num rows: 1 Data size: 104 Basic stats: COMPLETE
Column stats: NONE
Reduce Output Operator
sort order:
Statistics: Num rows: 1 Data size: 104 Basic stats: COMPLETE
Column stats: NONE
value expressions: _col0 (type: double), _col1 (type:
bigint), _col2 (type: bigint)
Reducer 4
Reduce Operator Tree:
Join Operator
condition map:
Inner Join 0 to 1
Inner Join 0 to 2
Inner Join 0 to 3
Inner Join 0 to 4
Inner Join 0 to 5
keys:
0
1
2
3
4
5
outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5,
_col6, _col7, _col8, _col9, _col10, _col11, _col12, _col13, _col14, _col15,
_col16, _col17
Statistics: Num rows: 1 Data size: 625 Basic stats: COMPLETE
Column stats: NONE
Limit
Number of rows: 100
Statistics: Num rows: 1 Data size: 625 Basic stats: COMPLETE
Column stats: NONE
File Output Operator
compressed: false
Statistics: Num rows: 1 Data size: 625 Basic stats:
COMPLETE Column stats: NONE
table:
input format:
org.apache.hadoop.mapred.SequenceFileInputFormat
output format:
org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
serde:
org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
Reducer 7
Reduce Operator Tree:
Group By Operator
aggregations: avg(VALUE._col0), count(VALUE._col1),
count(DISTINCT KEY._col0:0._col0)
mode: mergepartial
outputColumnNames: _col0, _col1, _col2
Statistics: Num rows: 1 Data size: 104 Basic stats: COMPLETE
Column stats: NONE
Reduce Output Operator
sort order:
Statistics: Num rows: 1 Data size: 104 Basic stats: COMPLETE
Column stats: NONE
value expressions: _col0 (type: double), _col1 (type:
bigint), _col2 (type: bigint)
Stage: Stage-0
Fetch Operator
limit: 100
Processor Tree:
ListSink
{code}
> Enable SharedWorkOptimizer in tez on HOS
> ----------------------------------------
>
> Key: HIVE-17486
> URL: https://issues.apache.org/jira/browse/HIVE-17486
> Project: Hive
> Issue Type: Bug
> Reporter: liyunzhang
> Assignee: liyunzhang
> Attachments: scanshare.after.svg, scanshare.before.svg
>
>
> in HIVE-16602, Implement shared scans with Tez.
> Given a query plan, the goal is to identify scans on input tables that can be
> merged so the data is read only once. Optimization will be carried out at the
> physical level. In Hive on Spark, it caches the result of spark work if the
> spark work is used by more than 1 child spark work. After sharedWorkOptimizer
> is enabled in physical plan in HoS, the identical table scans are merged to 1
> table scan. This result of table scan will be used by more 1 child spark
> work. Thus we need not do the same computation because of cache mechanism.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)