[ 
https://issues.apache.org/jira/browse/HIVE-17193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16214728#comment-16214728
 ] 

liyunzhang commented on HIVE-17193:
-----------------------------------

[~lirui]:
{quote}
1. The simplest solution is, if the DPP works' IDs (tracked by the target map 
works) are different, then we consider the target map works are different and 
don't combine them.
2. Another solution is we walk the parent tasks first, and combine equivalent 
DPP works. Two DPP works can be considered equivalent as long as they output 
same records.
{quote}
For #1, it can be implemented from the current code. For #2, how to compare the 
result of dpp work in the period of physical plan?  You mean directly comparing 
the estimated data size(Statistics: Num rows: 58 Data size: 5812)?

{code}
 Map 9 
            Map Operator Tree:
                TableScan
                  alias: src
                  Statistics: Num rows: 58 Data size: 5812 Basic stats: 
COMPLETE Column stats: NONE
                  Filter Operator
                    predicate: value is not null (type: boolean)
                    Statistics: Num rows: 58 Data size: 5812 Basic stats: 
COMPLETE Column stats: NONE
                    Select Operator
                      expressions: value (type: string)
                      outputColumnNames: _col0
                      Statistics: Num rows: 58 Data size: 5812 Basic stats: 
COMPLETE Column stats: NONE
                      Select Operator
                        expressions: _col0 (type: string)
                        outputColumnNames: _col0
                        Statistics: Num rows: 58 Data size: 5812 Basic stats: 
COMPLETE Column stats: NONE
                        Group By Operator
                          keys: _col0 (type: string)
                          mode: hash
                          outputColumnNames: _col0
                          Statistics: Num rows: 58 Data size: 5812 Basic stats: 
COMPLETE Column stats: NONE
                          Spark Partition Pruning Sink Operator
                            Target column: ds (string)
                            partition key expr: ds
                            Statistics: Num rows: 58 Data size: 5812 Basic 
stats: COMPLETE Column stats: NONE
                            target work: Map 5
{code}


{code}
  Map 8 
            Map Operator Tree:
                TableScan
                  alias: src
                  Statistics: Num rows: 58 Data size: 5812 Basic stats: 
COMPLETE Column stats: NONE
                  Filter Operator
                    predicate: key is not null (type: boolean)
                    Statistics: Num rows: 58 Data size: 5812 Basic stats: 
COMPLETE Column stats: NONE
                    Select Operator
                      expressions: key (type: string)
                      outputColumnNames: _col0
                      Statistics: Num rows: 58 Data size: 5812 Basic stats: 
COMPLETE Column stats: NONE
                      Select Operator
                        expressions: _col0 (type: string)
                        outputColumnNames: _col0
                        Statistics: Num rows: 58 Data size: 5812 Basic stats: 
COMPLETE Column stats: NONE
                        Group By Operator
                          keys: _col0 (type: string)
                          mode: hash
                          outputColumnNames: _col0
                          Statistics: Num rows: 58 Data size: 5812 Basic stats: 
COMPLETE Column stats: NONE
                          Spark Partition Pruning Sink Operator
                            Target column: ds (string)
                            partition key expr: ds
                            Statistics: Num rows: 58 Data size: 5812 Basic 
stats: COMPLETE Column stats: NONE
                            target work: Map 1

{code}


> HoS: don't combine map works that are targets of different DPPs
> ---------------------------------------------------------------
>
>                 Key: HIVE-17193
>                 URL: https://issues.apache.org/jira/browse/HIVE-17193
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Rui Li
>            Assignee: Rui Li
>
> Suppose {{srcpart}} is partitioned by {{ds}}. The following query can trigger 
> the issue:
> {code}
> explain
> select * from
>   (select srcpart.ds,srcpart.key from srcpart join src on srcpart.ds=src.key) 
> a
> join
>   (select srcpart.ds,srcpart.key from srcpart join src on 
> srcpart.ds=src.value) b
> on a.key=b.key;
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to