[
https://issues.apache.org/jira/browse/HIVE-17193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16214728#comment-16214728
]
liyunzhang commented on HIVE-17193:
-----------------------------------
[~lirui]:
{quote}
1. The simplest solution is, if the DPP works' IDs (tracked by the target map
works) are different, then we consider the target map works are different and
don't combine them.
2. Another solution is we walk the parent tasks first, and combine equivalent
DPP works. Two DPP works can be considered equivalent as long as they output
same records.
{quote}
For #1, it can be implemented from the current code. For #2, how to compare the
result of dpp work in the period of physical plan? You mean directly comparing
the estimated data size(Statistics: Num rows: 58 Data size: 5812)?
{code}
Map 9
Map Operator Tree:
TableScan
alias: src
Statistics: Num rows: 58 Data size: 5812 Basic stats:
COMPLETE Column stats: NONE
Filter Operator
predicate: value is not null (type: boolean)
Statistics: Num rows: 58 Data size: 5812 Basic stats:
COMPLETE Column stats: NONE
Select Operator
expressions: value (type: string)
outputColumnNames: _col0
Statistics: Num rows: 58 Data size: 5812 Basic stats:
COMPLETE Column stats: NONE
Select Operator
expressions: _col0 (type: string)
outputColumnNames: _col0
Statistics: Num rows: 58 Data size: 5812 Basic stats:
COMPLETE Column stats: NONE
Group By Operator
keys: _col0 (type: string)
mode: hash
outputColumnNames: _col0
Statistics: Num rows: 58 Data size: 5812 Basic stats:
COMPLETE Column stats: NONE
Spark Partition Pruning Sink Operator
Target column: ds (string)
partition key expr: ds
Statistics: Num rows: 58 Data size: 5812 Basic
stats: COMPLETE Column stats: NONE
target work: Map 5
{code}
{code}
Map 8
Map Operator Tree:
TableScan
alias: src
Statistics: Num rows: 58 Data size: 5812 Basic stats:
COMPLETE Column stats: NONE
Filter Operator
predicate: key is not null (type: boolean)
Statistics: Num rows: 58 Data size: 5812 Basic stats:
COMPLETE Column stats: NONE
Select Operator
expressions: key (type: string)
outputColumnNames: _col0
Statistics: Num rows: 58 Data size: 5812 Basic stats:
COMPLETE Column stats: NONE
Select Operator
expressions: _col0 (type: string)
outputColumnNames: _col0
Statistics: Num rows: 58 Data size: 5812 Basic stats:
COMPLETE Column stats: NONE
Group By Operator
keys: _col0 (type: string)
mode: hash
outputColumnNames: _col0
Statistics: Num rows: 58 Data size: 5812 Basic stats:
COMPLETE Column stats: NONE
Spark Partition Pruning Sink Operator
Target column: ds (string)
partition key expr: ds
Statistics: Num rows: 58 Data size: 5812 Basic
stats: COMPLETE Column stats: NONE
target work: Map 1
{code}
> HoS: don't combine map works that are targets of different DPPs
> ---------------------------------------------------------------
>
> Key: HIVE-17193
> URL: https://issues.apache.org/jira/browse/HIVE-17193
> Project: Hive
> Issue Type: Bug
> Reporter: Rui Li
> Assignee: Rui Li
>
> Suppose {{srcpart}} is partitioned by {{ds}}. The following query can trigger
> the issue:
> {code}
> explain
> select * from
> (select srcpart.ds,srcpart.key from srcpart join src on srcpart.ds=src.key)
> a
> join
> (select srcpart.ds,srcpart.key from srcpart join src on
> srcpart.ds=src.value) b
> on a.key=b.key;
> {code}
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)