[
https://issues.apache.org/jira/browse/HIVE-11297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16053446#comment-16053446
]
liyunzhang_intel commented on HIVE-11297:
-----------------------------------------
[~csun]: When i print the operator tree of multi_column_single_source.q when
debugging in
[SplitOpTreeForDPP|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SplitOpTreeForDPP.java#L75
], the physical plan is
{code}
set hive.execution.engine=spark;
set hive.auto.convert.join.noconditionaltask.size=20;
set hive.spark.dynamic.partition.pruning=true;
select count(*) from srcpart join srcpart_date_hour on (srcpart.ds =
srcpart_date_hour.ds and srcpart.hr = srcpart_date_hour.hr) where
srcpart_date_hour.`date` = '2008-04-08' and srcpart_date_hour.hour = 11;
{code}
physical plan
{code}
TS[1]-FIL[17]-RS[4]-JOIN[5]-GBY[8]-RS[9]-GBY[10]-FS[12]
-SEL[18]-GBY[19]-SPARKPRUNINGSINK[20]
-SEL[21]-GBY[22]-SPARKPRUNINGSINK[23]
{code}
{noformat}RS[4],SEL[18],SEL[21] is children of FIL[17]{noformat}
bq. I think in the original code the parent node of all branches is a filter
op, but now it is changed
I don't think so, i think now filter op is still {noformat}FIL[17]{noformat}.
the difference between previous is now. Before we split above tree into three
trees
{noformat}
tree1: TS[1]-FIL[17]-RS[4]-JOIN[5]-GBY[8]-RS[9]-GBY[10]-FS[12]
tree2: TS[1]-FIL[17]-SEL[18]-GBY[19]-SPARKPRUNINGSINK[20]
tree3: TS[1]-FIL[17]-SEL[21]-GBY[22]-SPARKPRUNINGSINK[23]
{noformat}
Now we split above tree into two trees
{noformat}
tree1: TS[1]-FIL[17]-RS[4]-JOIN[5]-GBY[8]-RS[9]-GBY[10]-FS[12]
tree2: TS[1]-FIL[17]-SEL[18]-GBY[19]-SPARKPRUNINGSINK[20]
-SEL[21]-GBY[22]-SPARKPRUNINGSINK[23]
{noformat}
> Combine op trees for partition info generating tasks [Spark branch]
> -------------------------------------------------------------------
>
> Key: HIVE-11297
> URL: https://issues.apache.org/jira/browse/HIVE-11297
> Project: Hive
> Issue Type: Bug
> Affects Versions: spark-branch
> Reporter: Chao Sun
> Assignee: liyunzhang_intel
> Attachments: HIVE-11297.1.patch, HIVE-11297.2.patch,
> HIVE-11297.3.patch, HIVE-11297.4.patch, HIVE-11297.5.patch
>
>
> Currently, for dynamic partition pruning in Spark, if a small table generates
> partition info for more than one partition columns, multiple operator trees
> are created, which all start from the same table scan op, but have different
> spark partition pruning sinks.
> As an optimization, we can combine these op trees and so don't have to do
> table scan multiple times.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)