[
https://issues.apache.org/jira/browse/HIVE-11297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16036554#comment-16036554
]
liyunzhang_intel commented on HIVE-11297:
-----------------------------------------
[~csun]: thanks for review. reply you on review board.
bq.Seems this removes the extra map work after it was generated. Is there a way
to avoid generating the map work in the first place?
physical operator tree will by spark partition pruningsink
original tree:
{noformat}
TS[1]-FIL[17]-RS[4]-JOIN[5]
-SEL[18]-GBY[19]-SPARKPRUNINGSINK[20]
-SEL[21]-GBY[22]-SPARKPRUNINGSINK[23]
{noformat}
after split by spark partition pruningsink:
{noformat}
TS[1]-FIL[17]-RS[4]-JOIN[5]
TS[1]-FIL[17]-SEL[18]-GBY[19]-SPARKPRUNINGSINK[20]
TS[1]-FIL[17]-SEL[21]-GBY[22]-SPARKPRUNINGSINK[23]
{noformat}
If we want to avoid generating multiple map
works({noformat}TS[1]-FIL[17]-SEL[18]-GBY[19]-SPARKPRUNINGSINK[20],TS[1]-FIL[17]-SEL[18]-GBY[22]-SPARKPRUNINGSINK[23]{noformat}),
we need remove the rule of spark dynamic partition pruning. If we remove that
rule, exception will be thrown because the remaining tree will not be in a
MapWork (
{noformat}
-SEL[18]-GBY[19]-SPARKPRUNINGSINK[20]
-SEL[21]-GBY[22]-SPARKPRUNINGSINK[23]
{noformat}
)
{code}
opRules.put(new RuleRegExp("Split Work - SparkPartitionPruningSink",
SparkPartitionPruningSinkOperator.getOperatorName() + "%"), genSparkWork);
{code}
If you have idea about this, please give me your suggestion.
> Combine op trees for partition info generating tasks [Spark branch]
> -------------------------------------------------------------------
>
> Key: HIVE-11297
> URL: https://issues.apache.org/jira/browse/HIVE-11297
> Project: Hive
> Issue Type: Bug
> Affects Versions: spark-branch
> Reporter: Chao Sun
> Assignee: liyunzhang_intel
> Attachments: HIVE-11297.1.patch
>
>
> Currently, for dynamic partition pruning in Spark, if a small table generates
> partition info for more than one partition columns, multiple operator trees
> are created, which all start from the same table scan op, but have different
> spark partition pruning sinks.
> As an optimization, we can combine these op trees and so don't have to do
> table scan multiple times.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)