[jira] [Commented] (HIVE-11297) Combine op trees for partition info generating tasks [Spark branch]

liyunzhang_intel (JIRA) Sun, 04 Jun 2017 22:41:20 -0700

    [ 
https://issues.apache.org/jira/browse/HIVE-11297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16036554#comment-16036554
 ]


liyunzhang_intel commented on HIVE-11297:
-----------------------------------------

[~csun]: thanks for review. reply you on review board.
bq.Seems this removes the extra map work after it was generated. Is there a way 
to avoid generating the map work in the first place?
physical operator tree will by spark partition pruningsink
original tree:
{noformat}
TS[1]-FIL[17]-RS[4]-JOIN[5]
             -SEL[18]-GBY[19]-SPARKPRUNINGSINK[20]
             -SEL[21]-GBY[22]-SPARKPRUNINGSINK[23]
{noformat}
after split by spark partition pruningsink:
{noformat}
TS[1]-FIL[17]-RS[4]-JOIN[5]
TS[1]-FIL[17]-SEL[18]-GBY[19]-SPARKPRUNINGSINK[20]
TS[1]-FIL[17]-SEL[21]-GBY[22]-SPARKPRUNINGSINK[23]
{noformat}
If we want to avoid generating multiple map 
works({noformat}TS[1]-FIL[17]-SEL[18]-GBY[19]-SPARKPRUNINGSINK[20],TS[1]-FIL[17]-SEL[18]-GBY[22]-SPARKPRUNINGSINK[23]{noformat}),
 we need remove the rule of spark dynamic partition pruning. If we remove that 
rule, exception will be thrown because the remaining tree will not be in a 
MapWork (   
{noformat}
             -SEL[18]-GBY[19]-SPARKPRUNINGSINK[20]
             -SEL[21]-GBY[22]-SPARKPRUNINGSINK[23]
{noformat}
             )
{code}
opRules.put(new RuleRegExp("Split Work - SparkPartitionPruningSink",
    SparkPartitionPruningSinkOperator.getOperatorName() + "%"), genSparkWork);

{code}

If you have idea about this, please give me your suggestion.

> Combine op trees for partition info generating tasks [Spark branch]
> -------------------------------------------------------------------
>
>                 Key: HIVE-11297
>                 URL: https://issues.apache.org/jira/browse/HIVE-11297
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: spark-branch
>            Reporter: Chao Sun
>            Assignee: liyunzhang_intel
>         Attachments: HIVE-11297.1.patch
>
>
> Currently, for dynamic partition pruning in Spark, if a small table generates 
> partition info for more than one partition columns, multiple operator trees 
> are created, which all start from the same table scan op, but have different 
> spark partition pruning sinks.
> As an optimization, we can combine these op trees and so don't have to do 
> table scan multiple times.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HIVE-11297) Combine op trees for partition info generating tasks [Spark branch]

Reply via email to