[jira] [Updated] (PIG-4871) Not use OperatorPlan#forceConnect in MultiQueryOptimizationSpark

liyunzhang_intel (JIRA) Fri, 15 Apr 2016 02:22:45 -0700

     [ 
https://issues.apache.org/jira/browse/PIG-4871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


liyunzhang_intel updated PIG-4871:
----------------------------------
    Attachment: PIG-4871.patch

>  Not use OperatorPlan#forceConnect in MultiQueryOptimizationSpark
> -----------------------------------------------------------------
>
>                 Key: PIG-4871
>                 URL: https://issues.apache.org/jira/browse/PIG-4871
>             Project: Pig
>          Issue Type: Sub-task
>          Components: spark
>            Reporter: liyunzhang_intel
>            Assignee: liyunzhang_intel
>             Fix For: spark-branch
>
>         Attachments: PIG-4871.patch
>
>
> In current code base, we use OperatorPlan#forceConnect() while merge the 
> physical plan of spliter and splittee in MultiQueryOptimizationSpark.
> The difference between OperatorPlan#connect and OperatorPlan#forceConnect is 
> not checking whether support multiOutputs and multiInputs or not in 
> forceConnect.
> {code}
>  /**
>      * connect from and to and ignore some judgements: like ignoring judge 
> whether from operator supports multiOutputs
>      * and whether to operator supports multiInputs
>      *
>      * @param from Operator data will flow from.
>      * @param to   Operator data will flow to.
>      * @throws PlanException if connect from or to which is not in the plan
>      */
>     public void forceConnect(E from, E to) throws PlanException {
>         markDirty();
>         // Check that both nodes are in the plan.
>         checkInPlan(from);
>         checkInPlan(to);
>         mFromEdges.put(from, to);
>         mToEdges.put(to, from);
>     }
> {code}
> Let's use an example to explain why add forceConnect before.
> {code}
> before multiquery optimization:
> scope-102->scope-108 
> scope-108
> #--------------------------------------------------
> # Spark Plan                                  
> #--------------------------------------------------
> Spark node scope-102
> Store(hdfs://zly1.sh.intel.com:8020/tmp/temp1293488480/tmp128312889:org.apache.pig.impl.io.InterStorage)
>  - scope-103
> |
> |---A: New For Each(false,false,false)[bag] - scope-75
>     |   |
>     |   Cast[int] - scope-67
>     |   |
>     |   |---Project[bytearray][0] - scope-66
>     |   |
>     |   Cast[int] - scope-70
>     |   |
>     |   |---Project[bytearray][1] - scope-69
>     |   |
>     |   Cast[int] - scope-73
>     |   |
>     |   |---Project[bytearray][2] - scope-72
>     |
>     |---A: 
> Load(hdfs://zly1.sh.intel.com:8020/user/root/split5:org.apache.pig.builtin.PigStorage)
>  - scope-65--------
> Spark node scope-108
> D: 
> Store(hdfs://zly1.sh.intel.com:8020/user/root/split5.out:org.apache.pig.builtin.PigStorage)
>  - scope-101
> |
> |---D: Filter[bag] - scope-97
>     |   |
>     |   Greater Than or Equal[boolean] - scope-100
>     |   |
>     |   |---Project[int][1] - scope-98
>     |   |
>     |   |---Project[int][4] - scope-99
>     |
>     |---C: New For Each(true,true)[tuple] - scope-96
>         |   |
>         |   Project[bag][1] - scope-94
>         |   |
>         |   Project[bag][2] - scope-95
>         |
>         |---C: Package(Packager)[tuple]{int} - scope-89
>             |
>             |---C: Global Rearrange[tuple] - scope-88
>                 |
>                 |---C: Local Rearrange[tuple]{int}(false) - scope-90
>                 |   |   |
>                 |   |   Project[int][0] - scope-91
>                 |   |
>                 |   
> |---Load(hdfs://zly1.sh.intel.com:8020/tmp/temp1293488480/tmp128312889:org.apache.pig.impl.io.InterStorage)
>  - scope-104
>                 |
>                 |---C: Local Rearrange[tuple]{int}(false) - scope-92
>                     |   |
>                     |   Project[int][0] - scope-93
>                     |
>                     |---B: New For Each(false,false)[bag] - scope-85
>                         |   |
>                         |   Project[int][0] - scope-81
>                         |   |
>                         |   Project[int][1] - scope-83
>                         |
>                         
> |---Load(hdfs://zly1.sh.intel.com:8020/tmp/temp1293488480/tmp128312889:org.apache.pig.impl.io.InterStorage)
>  - scope-106--------
> after multiquery optimization:
> scope-108
> #--------------------------------------------------
> # Spark Plan                                  
> #--------------------------------------------------
> Spark node scope-108
> D: 
> Store(hdfs://zly1.sh.intel.com:8020/user/root/split5.out:org.apache.pig.builtin.PigStorage)
>  - scope-101
> |
> |---D: Filter[bag] - scope-97
>     |   |
>     |   Greater Than or Equal[boolean] - scope-100
>     |   |
>     |   |---Project[int][1] - scope-98
>     |   |
>     |   |---Project[int][4] - scope-99
>     |
>     |---C: New For Each(true,true)[tuple] - scope-96
>         |   |
>         |   Project[bag][1] - scope-94
>         |   |
>         |   Project[bag][2] - scope-95
>         |
>         |---C: Package(Packager)[tuple]{int} - scope-89
>             |
>             |---C: Global Rearrange[tuple] - scope-88
>                 |
>                 |---C: Local Rearrange[tuple]{int}(false) - scope-90
>                 |   |   |
>                 |   |   Project[int][0] - scope-91
>                 |   |
>                 |   |---A: New For Each(false,false,false)[bag] - scope-119
>                 |       |   |
>                 |       |   Cast[int] - scope-114
>                 |       |   |
>                 |       |   |---Project[bytearray][0] - scope-113
>                 |       |   |
>                 |       |   Cast[int] - scope-116
>                 |       |   |
>                 |       |   |---Project[bytearray][1] - scope-115
>                 |       |   |
>                 |       |   Cast[int] - scope-118
>                 |       |   |
>                 |       |   |---Project[bytearray][2] - scope-117
>                 |       |
>                 |       |---A: 
> Load(hdfs://zly1.sh.intel.com:8020/user/root/split5:org.apache.pig.builtin.PigStorage)
>  - scope-112
>                 |
>                 |---C: Local Rearrange[tuple]{int}(false) - scope-92
>                     |   |
>                     |   Project[int][0] - scope-93
>                     |
>                     |---B: New For Each(false,false)[bag] - scope-85
>                         |   |
>                         |   Project[int][0] - scope-81
>                         |   |
>                         |   Project[int][1] - scope-83
>                         |
>                         |---A: New For Each(false,false,false)[bag] - 
> scope-128
>                             |   |
>                             |   Cast[int] - scope-123
>                             |   |
>                             |   |---Project[bytearray][0] - scope-122
>                             |   |
>                             |   Cast[int] - scope-125
>                             |   |
>                             |   |---Project[bytearray][1] - scope-124
>                             |   |
>                             |   Cast[int] - scope-127
>                             |   |
>                             |   |---Project[bytearray][2] - scope-126
>                             |
>                             |---A: 
> Load(hdfs://zly1.sh.intel.com:8020/user/root/split5:org.apache.pig.builtin.PigStorage)
>  - scope-121--------
> {code}
> we just copy subplan(Load:scope-121) of  SparkOperator(scope-102) to 
> SparkOperator(scope-108), connect POLoad(scope-121) with POForEach(scope-119) 
> and POForEach(scope-128). Here POLoad supports multiOutputs. we can use 
> OperatorPlan#connect. If a physical operator does not support multiOutputs, 
> we have to use OperatorPlan#forceConnect to do the connection.  Now we can 
> copy the subplan and generate a new subplan and merge the subplan to the 
> final plan to do multiquery optimization like following to solve this problem.
> {code}
> scope-108
> #--------------------------------------------------
> # Spark Plan                                  
> #--------------------------------------------------
> Spark node scope-108
> D: 
> Store(hdfs://zly1.sh.intel.com:8020/user/root/split5.out:org.apache.pig.builtin.PigStorage)
>  - scope-101
> |
> |---D: Filter[bag] - scope-97
>     |   |
>     |   Greater Than or Equal[boolean] - scope-100
>     |   |
>     |   |---Project[int][1] - scope-98
>     |   |
>     |   |---Project[int][4] - scope-99
>     |
>     |---C: New For Each(true,true)[tuple] - scope-96
>         |   |
>         |   Project[bag][1] - scope-94
>         |   |
>         |   Project[bag][2] - scope-95
>         |
>         |---C: Package(Packager)[tuple]{int} - scope-89
>             |
>             |---C: Global Rearrange[tuple] - scope-88
>                 |
>                 |---C: Local Rearrange[tuple]{int}(false) - scope-90
>                 |   |   |
>                 |   |   Project[int][0] - scope-91
>                 |   |
>                 |   |---A: New For Each(false,false,false)[bag] - scope-119
>                 |       |   |
>                 |       |   Cast[int] - scope-114
>                 |       |   |
>                 |       |   |---Project[bytearray][0] - scope-113
>                 |       |   |
>                 |       |   Cast[int] - scope-116
>                 |       |   |
>                 |       |   |---Project[bytearray][1] - scope-115
>                 |       |   |
>                 |       |   Cast[int] - scope-118
>                 |       |   |
>                 |       |   |---Project[bytearray][2] - scope-117
>                 |       |
>                 |       |---A: 
> Load(hdfs://zly1.sh.intel.com:8020/user/root/split5:org.apache.pig.builtin.PigStorage)
>  - scope-112
>                 |
>                 |---C: Local Rearrange[tuple]{int}(false) - scope-92
>                     |   |
>                     |   Project[int][0] - scope-93
>                     |
>                     |---B: New For Each(false,false)[bag] - scope-85
>                         |   |
>                         |   Project[int][0] - scope-81
>                         |   |
>                         |   Project[int][1] - scope-83
>                         |
>                         |---A: New For Each(false,false,false)[bag] - 
> scope-128
>                             |   |
>                             |   Cast[int] - scope-123
>                             |   |
>                             |   |---Project[bytearray][0] - scope-122
>                             |   |
>                             |   Cast[int] - scope-125
>                             |   |
>                             |   |---Project[bytearray][1] - scope-124
>                             |   |
>                             |   Cast[int] - scope-127
>                             |   |
>                             |   |---Project[bytearray][2] - scope-126
>                             |
>                             |---A: 
> Load(hdfs://zly1.sh.intel.com:8020/user/root/split5:org.apache.pig.builtin.PigStorage)
>  - scope-132--------
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (PIG-4871) Not use OperatorPlan#forceConnect in MultiQueryOptimizationSpark

Reply via email to