[ 
https://issues.apache.org/jira/browse/PIG-4839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

liyunzhang_intel updated PIG-4839:
----------------------------------
    Parent Issue: PIG-4856  (was: PIG-4059)

> MultiQueryOptimizerSpark doesn't remove all redudant nodes in spark plan
> ------------------------------------------------------------------------
>
>                 Key: PIG-4839
>                 URL: https://issues.apache.org/jira/browse/PIG-4839
>             Project: Pig
>          Issue Type: Sub-task
>          Components: spark
>            Reporter: liyunzhang_intel
>            Assignee: liyunzhang_intel
>             Fix For: spark-branch
>
>
> TestMultiQueryBasic#testMultiQueryWithFJ_2
> {code}
> a = load './passwd' using PigStorage(':') as 
> (uname:chararray,passwd:chararray, uid:int, gid:int);
> b = load './passwd' using PigStorage(':') as 
> (uname:chararray,passwd:chararray, uid:int, gid:int);
> c = filter a by uid > 5;
> store c into './multiQueryFJ.output';
> d = filter b by gid > 10;
> store d into './multiQueryFJ.output.2';
> e = join c by gid, d by gid using 'repl';
> store e into './multiQueryFJ.output.3';
> {code}
> The spark plan:
> {code}
> before multiquery optimization:
> scope-57->scope-60 scope-66
> scope-60
> scope-61->scope-64 scope-68
> scope-64
> scope-66
> scope-68->scope-66
> #--------------------------------------------------
> # Spark Plan                                 
> #--------------------------------------------------
> Spark node scope-61
> Store(hdfs://zly1.sh.intel.com:8020/tmp/temp-1338833908/tmp-1814908586:org.apache.pig.impl.io.InterStorage)
>  - scope-62
> |
> |---d: Filter[bag] - scope-36
>     |   |
>     |   Greater Than[boolean] - scope-39
>     |   |
>     |   |---Project[int][3] - scope-37
>     |   |
>     |   |---Constant(10) - scope-38
>     |
>     |---b: New For Each(false,false,false,false)[bag] - scope-35
>         |   |
>         |   Cast[chararray] - scope-24
>         |   |
>         |   |---Project[bytearray][0] - scope-23
>         |   |
>         |   Cast[chararray] - scope-27
>         |   |
>         |   |---Project[bytearray][1] - scope-26
>         |   |
>         |   Cast[int] - scope-30
>         |   |
>         |   |---Project[bytearray][2] - scope-29
>         |   |
>         |   Cast[int] - scope-33
>         |   |
>         |   |---Project[bytearray][3] - scope-32
>         |
>         |---b: 
> Load(hdfs://zly1.sh.intel.com:8020/user/root/passwd:PigStorage(':')) - 
> scope-22--------
> Spark node scope-64
> d: 
> Store(hdfs://zly1.sh.intel.com:8020/user/root/multiQueryFJ.output.2:org.apache.pig.builtin.PigStorage)
>  - scope-43
> |
> |---Load(hdfs://zly1.sh.intel.com:8020/tmp/temp-1338833908/tmp-1814908586:org.apache.pig.impl.io.InterStorage)
>  - scope-63--------
> Spark node scope-68
> Store(hdfs://zly1.sh.intel.com:8020/tmp/temp-1338833908/tmp-1233897062:org.apache.pig.impl.io.InterStorage)
>  - scope-69
> |
> |---Load(hdfs://zly1.sh.intel.com:8020/tmp/temp-1338833908/tmp-1814908586:org.apache.pig.impl.io.InterStorage)
>  - scope-67--------
> Spark node scope-66
> e: 
> Store(hdfs://zly1.sh.intel.com:8020/user/root/multiQueryFJ.output.3:org.apache.pig.builtin.PigStorage)
>  - scope-56
> |
> |---e: FRJoin[tuple] - scope-50
>     |   |
>     |   Project[int][3] - scope-48
>     |   |
>     |   Project[int][3] - scope-49
>     |
>     
> |---Load(hdfs://zly1.sh.intel.com:8020/tmp/temp-1338833908/tmp929915440:org.apache.pig.impl.io.InterStorage)
>  - scope-65--------
> Spark node scope-57
> Store(hdfs://zly1.sh.intel.com:8020/tmp/temp-1338833908/tmp929915440:org.apache.pig.impl.io.InterStorage)
>  - scope-58
> |
> |---c: Filter[bag] - scope-14
>     |   |
>     |   Greater Than[boolean] - scope-17
>     |   |
>     |   |---Project[int][2] - scope-15
>     |   |
>     |   |---Constant(5) - scope-16
>     |
>     |---a: New For Each(false,false,false,false)[bag] - scope-13
>         |   |
>         |   Cast[chararray] - scope-2
>         |   |
>         |   |---Project[bytearray][0] - scope-1
>         |   |
>         |   Cast[chararray] - scope-5
>         |   |
>         |   |---Project[bytearray][1] - scope-4
>         |   |
>         |   Cast[int] - scope-8
>         |   |
>         |   |---Project[bytearray][2] - scope-7
>         |   |
>         |   Cast[int] - scope-11
>         |   |
>         |   |---Project[bytearray][3] - scope-10
>         |
>         |---a: 
> Load(hdfs://zly1.sh.intel.com:8020/user/root/passwd:PigStorage(':')) - 
> scope-0--------
> Spark node scope-60
> c: 
> Store(hdfs://zly1.sh.intel.com:8020/user/root/multiQueryFJ.output:org.apache.pig.builtin.PigStorage)
>  - scope-21
> |
> |---Load(hdfs://zly1.sh.intel.com:8020/tmp/temp-1338833908/tmp929915440:org.apache.pig.impl.io.InterStorage)
>  - scope-59--------
> {code}
> After spark multiquery optimization, 6 spark nodes will be reduced to 4.
> scope-60 should be combined with scope-57 but not.
> {code}
> scope-57->scope-60 scope-66 
> scope-60
> scope-61->scope-66
> scope-66
> #--------------------------------------------------
> # Spark Plan                                 
> #--------------------------------------------------
> Spark node scope-61
> Split - scope-70
> |   |
> |   d: 
> Store(hdfs://zly1.sh.intel.com:8020/user/root/multiQueryFJ.output.2:org.apache.pig.builtin.PigStorage)
>  - scope-43
> |   |
> |   
> Store(hdfs://zly1.sh.intel.com:8020/tmp/temp-1338833908/tmp-1233897062:org.apache.pig.impl.io.InterStorage)
>  - scope-69
> |
> |---d: Filter[bag] - scope-36
>     |   |
>     |   Greater Than[boolean] - scope-39
>     |   |
>     |   |---Project[int][3] - scope-37
>     |   |
>     |   |---Constant(10) - scope-38
>     |
>     |---b: New For Each(false,false,false,false)[bag] - scope-35
>         |   |
>         |   Cast[chararray] - scope-24
>         |   |
>         |   |---Project[bytearray][0] - scope-23
>         |   |
>         |   Cast[chararray] - scope-27
>         |   |
>         |   |---Project[bytearray][1] - scope-26
>         |   |
>         |   Cast[int] - scope-30
>         |   |
>         |   |---Project[bytearray][2] - scope-29
>         |   |
>         |   Cast[int] - scope-33
>         |   |
>         |   |---Project[bytearray][3] - scope-32
>         |
>         |---b: 
> Load(hdfs://zly1.sh.intel.com:8020/user/root/passwd:PigStorage(':')) - 
> scope-22--------
> Spark node scope-66
> e: 
> Store(hdfs://zly1.sh.intel.com:8020/user/root/multiQueryFJ.output.3:org.apache.pig.builtin.PigStorage)
>  - scope-56
> |
> |---e: FRJoin[tuple] - scope-50
>     |   |
>     |   Project[int][3] - scope-48
>     |   |
>     |   Project[int][3] - scope-49
>     |
>     
> |---Load(hdfs://zly1.sh.intel.com:8020/tmp/temp-1338833908/tmp929915440:org.apache.pig.impl.io.InterStorage)
>  - scope-65--------
> Spark node scope-57
> Store(hdfs://zly1.sh.intel.com:8020/tmp/temp-1338833908/tmp929915440:org.apache.pig.impl.io.InterStorage)
>  - scope-58
> |
> |---c: Filter[bag] - scope-14
>     |   |
>     |   Greater Than[boolean] - scope-17
>     |   |
>     |   |---Project[int][2] - scope-15
>     |   |
>     |   |---Constant(5) - scope-16
>     |
>     |---a: New For Each(false,false,false,false)[bag] - scope-13
>         |   |
>         |   Cast[chararray] - scope-2
>         |   |
>         |   |---Project[bytearray][0] - scope-1
>         |   |
>         |   Cast[chararray] - scope-5
>         |   |
>         |   |---Project[bytearray][1] - scope-4
>         |   |
>         |   Cast[int] - scope-8
>         |   |
>         |   |---Project[bytearray][2] - scope-7
>         |   |
>         |   Cast[int] - scope-11
>         |   |
>         |   |---Project[bytearray][3] - scope-10
>         |
>         |---a: 
> Load(hdfs://zly1.sh.intel.com:8020/user/root/passwd:PigStorage(':')) - 
> scope-0--------
> Spark node scope-60
> c: 
> Store(hdfs://zly1.sh.intel.com:8020/user/root/multiQueryFJ.output:org.apache.pig.builtin.PigStorage)
>  - scope-21
> |
> |---Load(hdfs://zly1.sh.intel.com:8020/tmp/temp-1338833908/tmp929915440:org.apache.pig.impl.io.InterStorage)
>  - scope-59--------
> {code}
> Following is mr plan after multiquery optimization
> {code}
> scope57->scope-66
> scope-61->scope-66
> scope-66
> #--------------------------------------------------
> # Map Reduce Plan                                 
> #--------------------------------------------------
> MapReduce node scope-61
> Map Plan
> Split - scope-70
> |   |
> |   d: 
> Store(hdfs://zly1.sh.intel.com:8020/user/root/multiQueryFJ.output.2:org.apache.pig.builtin.PigStorage)
>  - scope-43
> |   |
> |   
> Store(hdfs://zly1.sh.intel.com:8020/tmp/temp411366696/tmp-223707761:org.apache.pig.impl.io.InterStorage)
>  - scope-69
> |
> |---d: Filter[bag] - scope-36
>     |   |
>     |   Greater Than[boolean] - scope-39
>     |   |
>     |   |---Project[int][3] - scope-37
>     |   |
>     |   |---Constant(10) - scope-38
>     |
>     |---b: New For Each(false,false,false,false)[bag] - scope-35
>         |   |
>         |   Cast[chararray] - scope-24
>         |   |
>         |   |---Project[bytearray][0] - scope-23
>         |   |
>         |   Cast[chararray] - scope-27
>         |   |
>         |   |---Project[bytearray][1] - scope-26
>         |   |
>         |   Cast[int] - scope-30
>         |   |
>         |   |---Project[bytearray][2] - scope-29
>         |   |
>         |   Cast[int] - scope-33
>         |   |
>         |   |---Project[bytearray][3] - scope-32
>         |
>         |---b: 
> Load(hdfs://zly1.sh.intel.com:8020/user/root/passwd:PigStorage(':')) - 
> scope-22--------
> Global sort: false
> ----------------
> MapReduce node scope-66
> Map Plan
> e: 
> Store(hdfs://zly1.sh.intel.com:8020/user/root/multiQueryFJ.output.3:org.apache.pig.builtin.PigStorage)
>  - scope-56
> |
> |---e: FRJoin[tuple] - scope-50
>     |   |
>     |   Project[int][3] - scope-48
>     |   |
>     |   Project[int][3] - scope-49
>     |
>     
> |---Load(hdfs://zly1.sh.intel.com:8020/tmp/temp411366696/tmp-729323405:org.apache.pig.impl.io.InterStorage)
>  - scope-65--------
> Global sort: false
> ----------------
> MapReduce node scope-57
> Map Plan
> Split - scope-71
> |   |
> |   c: 
> Store(hdfs://zly1.sh.intel.com:8020/user/root/multiQueryFJ.output:org.apache.pig.builtin.PigStorage)
>  - scope-21
> |   |
> |   
> Store(hdfs://zly1.sh.intel.com:8020/tmp/temp411366696/tmp-729323405:org.apache.pig.impl.io.InterStorage)
>  - scope-58
> |
> |---c: Filter[bag] - scope-14
>     |   |
>     |   Greater Than[boolean] - scope-17
>     |   |
>     |   |---Project[int][2] - scope-15
>     |   |
>     |   |---Constant(5) - scope-16
>     |
>     |---a: New For Each(false,false,false,false)[bag] - scope-13
>         |   |
>         |   Cast[chararray] - scope-2
>         |   |
>         |   |---Project[bytearray][0] - scope-1
>         |   |
>         |   Cast[chararray] - scope-5
>         |   |
>         |   |---Project[bytearray][1] - scope-4
>         |   |
>         |   Cast[int] - scope-8
>         |   |
>         |   |---Project[bytearray][2] - scope-7
>         |   |
>         |   Cast[int] - scope-11
>         |   |
>         |   |---Project[bytearray][3] - scope-10
>         |
>         |---a: 
> Load(hdfs://zly1.sh.intel.com:8020/user/root/passwd:PigStorage(':')) - 
> scope-0--------
> Global sort: false
> ----------------
> {code}
> After this jjra is fixed, [the 
> modification|https://github.com/apache/pig/blob/spark/test/org/apache/pig/test/TestPigRunner.java#L458]
>  in TestPigRunner can be removed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to