[jira] [Updated] (HIVE-11297) Combine op trees for partition info generating tasks [Spark branch]

liyunzhang_intel (JIRA) Thu, 15 Jun 2017 01:32:25 -0700

     [ 
https://issues.apache.org/jira/browse/HIVE-11297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


liyunzhang_intel updated HIVE-11297:
------------------------------------
    Attachment: HIVE-11297.4.patch

[~csun]: update HIVE-11297.4.patch according to what you mentioned on RB.
{noformat}
         TS1    TS2
          |           |
          FIL1    FIL2
          |           |
          RS     SEL---
          |          |   \        \
          |        RS  SEL  SEL
          \   /      |     |
          JOIN      GBY   GBY
                      |    |
                      |  SPARKPRUNINGSINK
                      |
                  SPARKPRUNINGSINK
{noformat}
current algorithms:
1. find the filter FIL2, tranverse each branch of FIL2 and get the children 
which start branches contain SPARKPRUNINGSINK.
2.  split the tree into 2 seperate tree

> Combine op trees for partition info generating tasks [Spark branch]
> -------------------------------------------------------------------
>
>                 Key: HIVE-11297
>                 URL: https://issues.apache.org/jira/browse/HIVE-11297
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: spark-branch
>            Reporter: Chao Sun
>            Assignee: liyunzhang_intel
>         Attachments: HIVE-11297.1.patch, HIVE-11297.2.patch, 
> HIVE-11297.3.patch
>
>
> Currently, for dynamic partition pruning in Spark, if a small table generates 
> partition info for more than one partition columns, multiple operator trees 
> are created, which all start from the same table scan op, but have different 
> spark partition pruning sinks.
> As an optimization, we can combine these op trees and so don't have to do 
> table scan multiple times.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-11297) Combine op trees for partition info generating tasks [Spark branch]

Reply via email to