[
https://issues.apache.org/jira/browse/HIVE-17193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16214603#comment-16214603
]
liyunzhang_intel commented on HIVE-17193:
-----------------------------------------
[~lirui]: I remember this problem when i developed HIVE-16948. But I can not
reproduce this problem on hive(commit a51ae9c) now
{code}
set hive.explain.user=false;
set hive.spark.dynamic.partition.pruning=true;
set hive.tez.dynamic.partition.pruning=true;
set hive.auto.convert.join=false;
explain
select * from
(select srcpart.ds,srcpart.key from srcpart join src on srcpart.ds=src.key) a
join
(select srcpart.ds,srcpart.key from srcpart join src on srcpart.ds=src.value)
b
on a.key=b.key;
{code}
the explain
{code}
STAGE DEPENDENCIES:
Stage-2 is a root stage
Stage-1 depends on stages: Stage-2
Stage-0 depends on stages: Stage-1
STAGE PLANS:
Stage: Stage-2
Spark
DagName: root_20171022233200_990c146c-b49f-49b9-9a5b-a0028e34f200:2
Vertices:
Map 8
Map Operator Tree:
TableScan
alias: src
Statistics: Num rows: 58 Data size: 5812 Basic stats:
COMPLETE Column stats: NONE
Filter Operator
predicate: key is not null (type: boolean)
Statistics: Num rows: 58 Data size: 5812 Basic stats:
COMPLETE Column stats: NONE
Select Operator
expressions: key (type: string)
outputColumnNames: _col0
Statistics: Num rows: 58 Data size: 5812 Basic stats:
COMPLETE Column stats: NONE
Select Operator
expressions: _col0 (type: string)
outputColumnNames: _col0
Statistics: Num rows: 58 Data size: 5812 Basic stats:
COMPLETE Column stats: NONE
Group By Operator
keys: _col0 (type: string)
mode: hash
outputColumnNames: _col0
Statistics: Num rows: 58 Data size: 5812 Basic stats:
COMPLETE Column stats: NONE
Spark Partition Pruning Sink Operator
Target column: ds (string)
partition key expr: ds
Statistics: Num rows: 58 Data size: 5812 Basic
stats: COMPLETE Column stats: NONE
target work: Map 1
Map 9
Map Operator Tree:
TableScan
alias: src
Statistics: Num rows: 58 Data size: 5812 Basic stats:
COMPLETE Column stats: NONE
Filter Operator
predicate: value is not null (type: boolean)
Statistics: Num rows: 58 Data size: 5812 Basic stats:
COMPLETE Column stats: NONE
Select Operator
expressions: value (type: string)
outputColumnNames: _col0
Statistics: Num rows: 58 Data size: 5812 Basic stats:
COMPLETE Column stats: NONE
Select Operator
expressions: _col0 (type: string)
outputColumnNames: _col0
Statistics: Num rows: 58 Data size: 5812 Basic stats:
COMPLETE Column stats: NONE
Group By Operator
keys: _col0 (type: string)
mode: hash
outputColumnNames: _col0
Statistics: Num rows: 58 Data size: 5812 Basic stats:
COMPLETE Column stats: NONE
Spark Partition Pruning Sink Operator
Target column: ds (string)
partition key expr: ds
Statistics: Num rows: 58 Data size: 5812 Basic
stats: COMPLETE Column stats: NONE
target work: Map 5
Stage: Stage-1
Spark
Edges:
Reducer 2 <- Map 1 (PARTITION-LEVEL SORT, 1), Map 4 (PARTITION-LEVEL
SORT, 1)
Reducer 3 <- Reducer 2 (PARTITION-LEVEL SORT, 1), Reducer 6
(PARTITION-LEVEL SORT, 1)
Reducer 6 <- Map 5 (PARTITION-LEVEL SORT, 1), Map 7 (PARTITION-LEVEL
SORT, 1)
DagName: root_20171022233200_990c146c-b49f-49b9-9a5b-a0028e34f200:1
Vertices:
Map 1
Map Operator Tree:
TableScan
alias: srcpart
Statistics: Num rows: 232 Data size: 23248 Basic stats:
COMPLETE Column stats: NONE
Filter Operator
predicate: key is not null (type: boolean)
Statistics: Num rows: 232 Data size: 23248 Basic stats:
COMPLETE Column stats: NONE
Select Operator
expressions: key (type: string), ds (type: string)
outputColumnNames: _col0, _col1
Statistics: Num rows: 232 Data size: 23248 Basic stats:
COMPLETE Column stats: NONE
Reduce Output Operator
key expressions: _col1 (type: string)
sort order: +
Map-reduce partition columns: _col1 (type: string)
Statistics: Num rows: 232 Data size: 23248 Basic stats:
COMPLETE Column stats: NONE
value expressions: _col0 (type: string)
Map 4
Map Operator Tree:
TableScan
alias: src
Statistics: Num rows: 58 Data size: 5812 Basic stats:
COMPLETE Column stats: NONE
Filter Operator
predicate: key is not null (type: boolean)
Statistics: Num rows: 58 Data size: 5812 Basic stats:
COMPLETE Column stats: NONE
Select Operator
expressions: key (type: string)
outputColumnNames: _col0
Statistics: Num rows: 58 Data size: 5812 Basic stats:
COMPLETE Column stats: NONE
Reduce Output Operator
key expressions: _col0 (type: string)
sort order: +
Map-reduce partition columns: _col0 (type: string)
Statistics: Num rows: 58 Data size: 5812 Basic stats:
COMPLETE Column stats: NONE
Map 5
Map Operator Tree:
TableScan
alias: srcpart
Statistics: Num rows: 232 Data size: 23248 Basic stats:
COMPLETE Column stats: NONE
Filter Operator
predicate: key is not null (type: boolean)
Statistics: Num rows: 232 Data size: 23248 Basic stats:
COMPLETE Column stats: NONE
Select Operator
expressions: key (type: string), ds (type: string)
outputColumnNames: _col0, _col1
Statistics: Num rows: 232 Data size: 23248 Basic stats:
COMPLETE Column stats: NONE
Reduce Output Operator
key expressions: _col1 (type: string)
sort order: +
Map-reduce partition columns: _col1 (type: string)
Statistics: Num rows: 232 Data size: 23248 Basic stats:
COMPLETE Column stats: NONE
value expressions: _col0 (type: string)
Map 7
Map Operator Tree:
TableScan
alias: src
Statistics: Num rows: 58 Data size: 5812 Basic stats:
COMPLETE Column stats: NONE
Filter Operator
predicate: value is not null (type: boolean)
Statistics: Num rows: 58 Data size: 5812 Basic stats:
COMPLETE Column stats: NONE
Select Operator
expressions: value (type: string)
outputColumnNames: _col0
Statistics: Num rows: 58 Data size: 5812 Basic stats:
COMPLETE Column stats: NONE
Reduce Output Operator
key expressions: _col0 (type: string)
sort order: +
Map-reduce partition columns: _col0 (type: string)
Statistics: Num rows: 58 Data size: 5812 Basic stats:
COMPLETE Column stats: NONE
Reducer 2
Reduce Operator Tree:
Join Operator
condition map:
Inner Join 0 to 1
keys:
0 _col1 (type: string)
1 _col0 (type: string)
outputColumnNames: _col0, _col1
Statistics: Num rows: 255 Data size: 25572 Basic stats:
COMPLETE Column stats: NONE
Select Operator
expressions: _col1 (type: string), _col0 (type: string)
outputColumnNames: _col0, _col1
Statistics: Num rows: 255 Data size: 25572 Basic stats:
COMPLETE Column stats: NONE
Reduce Output Operator
key expressions: _col1 (type: string)
sort order: +
Map-reduce partition columns: _col1 (type: string)
Statistics: Num rows: 255 Data size: 25572 Basic stats:
COMPLETE Column stats: NONE
value expressions: _col0 (type: string)
Reducer 3
Reduce Operator Tree:
Join Operator
condition map:
Inner Join 0 to 1
keys:
0 _col1 (type: string)
1 _col1 (type: string)
outputColumnNames: _col0, _col1, _col2, _col3
Statistics: Num rows: 280 Data size: 28129 Basic stats:
COMPLETE Column stats: NONE
File Output Operator
compressed: false
Statistics: Num rows: 280 Data size: 28129 Basic stats:
COMPLETE Column stats: NONE
table:
input format:
org.apache.hadoop.mapred.SequenceFileInputFormat
output format:
org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
Reducer 6
Reduce Operator Tree:
Join Operator
condition map:
Inner Join 0 to 1
keys:
0 _col1 (type: string)
1 _col0 (type: string)
outputColumnNames: _col0, _col1
Statistics: Num rows: 255 Data size: 25572 Basic stats:
COMPLETE Column stats: NONE
Select Operator
expressions: _col1 (type: string), _col0 (type: string)
outputColumnNames: _col0, _col1
Statistics: Num rows: 255 Data size: 25572 Basic stats:
COMPLETE Column stats: NONE
Reduce Output Operator
key expressions: _col1 (type: string)
sort order: +
Map-reduce partition columns: _col1 (type: string)
Statistics: Num rows: 255 Data size: 25572 Basic stats:
COMPLETE Column stats: NONE
value expressions: _col0 (type: string)
Stage: Stage-0
Fetch Operator
limit: -1
Processor Tree:
ListSink
{code}
I guess what you mean in the jira is that map8 and map9 are combined as one map
in your env as the operators in these two map are same. The reason why there
are not combined in my env is the filter operators in Map8 and Map9 are not
same.
{code}
Map8
Filter Operator
predicate: key is not null (type: boolean)
Statistics: Num rows: 58 Data size: 5812 Basic stats:
COMPLETE Column stats: NONE
{code}
{code}
Map9
Filter Operator
predicate: value is not null (type: boolean)
Statistics: Num rows: 58 Data size: 5812 Basic stats:
COMPLETE Column stats: NONE
{code}
Can you provide your scripts? thanks!
> HoS: don't combine map works that are targets of different DPPs
> ---------------------------------------------------------------
>
> Key: HIVE-17193
> URL: https://issues.apache.org/jira/browse/HIVE-17193
> Project: Hive
> Issue Type: Bug
> Reporter: Rui Li
> Assignee: Rui Li
>
> Suppose {{srcpart}} is partitioned by {{ds}}. The following query can trigger
> the issue:
> {code}
> explain
> select * from
> (select srcpart.ds,srcpart.key from srcpart join src on srcpart.ds=src.key)
> a
> join
> (select srcpart.ds,srcpart.key from srcpart join src on
> srcpart.ds=src.value) b
> on a.key=b.key;
> {code}
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)