[
https://issues.apache.org/jira/browse/PIG-4594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15039583#comment-15039583
]
liyunzhang_intel commented on PIG-4594:
---------------------------------------
[~mohitsabharwal] and [~kexianda]:
Let's explain why need add forceConnect method in PhysicalPlan.java:
{quote}
If we want to connect x to y and z. In MR implementation, we clone two copies
x1 and x2. x1 will be connected to y, x2 will be connected to z..
{quote}
Agree with Xianda. Let's see an example for it.
{code}
cat bin/testMultiQueryJiraPig983_2.pig
a = load './passwd' using PigStorage(':') as (uname:chararray,
passwd:chararray, uid:int, gid:int);
b = filter a by uid < 5;
c = filter a by uid >= 5;
d = join b by uname, c by uname;
store d into './testMultiQueryJiraPig983_2.out';
{code}
You can see in following result,after multiquery optimization, scope-57 and
scope-67 actullay are same, so in mr implemention, scope-67 copys scope-57 to
avoid exception
"This operator does not support multiple outputs". we need do the load *twice*
even though they are *same.*
{code}
before multiquery optimization:
#--------------------------------------------------
# Map Reduce Plan
#--------------------------------------------------
MapReduce node scope-39
Map Plan
Store(hdfs://zly1.sh.intel.com:8020/tmp/temp45078980/tmp-1782295863:org.apache.pig.impl.io.InterStorage)
- scope-40
|
|---a: New For Each(false,false,false,false)[bag] - scope-13
| |
| Cast[chararray] - scope-2
| |
| |---Project[bytearray][0] - scope-1
| |
| Cast[chararray] - scope-5
| |
| |---Project[bytearray][1] - scope-4
| |
| Cast[int] - scope-8
| |
| |---Project[bytearray][2] - scope-7
| |
| Cast[int] - scope-11
| |
| |---Project[bytearray][3] - scope-10
|
|---a: Load(hdfs://zly1.sh.intel.com:8020/user/root/passwd:PigStorage(':'))
- scope-0--------
Global sort: false
----------------
MapReduce node scope-45
Map Plan
Union[tuple] - scope-46
|
|---d: Local Rearrange[tuple]{chararray}(false) - scope-31
| | |
| | Project[chararray][0] - scope-32
| |
| |---b: Filter[bag] - scope-17
| | |
| | Less Than[boolean] - scope-20
| | |
| | |---Project[int][2] - scope-18
| | |
| | |---Constant(5) - scope-19
| |
|
|---Load(hdfs://zly1.sh.intel.com:8020/tmp/temp45078980/tmp-1782295863:org.apache.pig.impl.io.InterStorage)
- scope-41
|
|---d: Local Rearrange[tuple]{chararray}(false) - scope-33
| |
| Project[chararray][0] - scope-34
|
|---c: Filter[bag] - scope-23
| |
| Greater Than or Equal[boolean] - scope-26
| |
| |---Project[int][2] - scope-24
| |
| |---Constant(5) - scope-25
|
|---Load(hdfs://zly1.sh.intel.com:8020/tmp/temp45078980/tmp-1782295863:org.apache.pig.impl.io.InterStorage)
- scope-43--------
Reduce Plan
d:
Store(hdfs://zly1.sh.intel.com:8020/user/root/testMultiQueryJiraPig983_2.out:org.apache.pig.builtin.PigStorage)
- scope-38
|
|---d: Package(JoinPackager(true,true))[tuple]{chararray} - scope-30--------
Global sort: false
----------------
after multiquery optimization:
#--------------------------------------------------
# Map Reduce Plan
#--------------------------------------------------
MapReduce node scope-45
Map Plan
Union[tuple] - scope-46
|
|---d: Local Rearrange[tuple]{chararray}(false) - scope-31
| | |
| | Project[chararray][0] - scope-32
| |
| |---b: Filter[bag] - scope-17
| | |
| | Less Than[boolean] - scope-20
| | |
| | |---Project[int][2] - scope-18
| | |
| | |---Constant(5) - scope-19
| |
| |---a: New For Each(false,false,false,false)[bag] - scope-67
| | |
| | Cast[chararray] - scope-60
| | |
| | |---Project[bytearray][0] - scope-59
| | |
| | Cast[chararray] - scope-62
| | |
| | |---Project[bytearray][1] - scope-61
| | |
| | Cast[int] - scope-64
| | |
| | |---Project[bytearray][2] - scope-63
| | |
| | Cast[int] - scope-66
| | |
| | |---Project[bytearray][3] - scope-65
| |
| |---a:
Load(hdfs://zly1.sh.intel.com:8020/user/root/passwd:PigStorage(':')) - scope-58
|
|---d: Local Rearrange[tuple]{chararray}(false) - scope-33
| |
| Project[chararray][0] - scope-34
|
|---c: Filter[bag] - scope-23
| |
| Greater Than or Equal[boolean] - scope-26
| |
| |---Project[int][2] - scope-24
| |
| |---Constant(5) - scope-25
|
|---a: New For Each(false,false,false,false)[bag] - scope-57
| |
| Cast[chararray] - scope-50
| |
| |---Project[bytearray][0] - scope-49
| |
| Cast[chararray] - scope-52
| |
| |---Project[bytearray][1] - scope-51
| |
| Cast[int] - scope-54
| |
| |---Project[bytearray][2] - scope-53
| |
| Cast[int] - scope-56
| |
| |---Project[bytearray][3] - scope-55
|
|---a:
Load(hdfs://zly1.sh.intel.com:8020/user/root/passwd:PigStorage(':')) -
scope-48--------
Reduce Plan
d:
Store(hdfs://zly1.sh.intel.com:8020/user/root/testMultiQueryJiraPig983_2.out:org.apache.pig.builtin.PigStorage)
- scope-38
|
|---d: Package(JoinPackager(true,true))[tuple]{chararray} - scope-30--------
Global sort: false
----------------
{code}
*but* in spark multiquery optimization: we *don't* need do the load
*twice*(only have one load(*scope-0*))
the result in spark mode:
{code}
before multiquery optimization:
scope-39->scope-45
scope-45
#--------------------------------------------------
# Spark Plan
#--------------------------------------------------
Spark node scope-39
Store(hdfs://zly1.sh.intel.com:8020/tmp/temp1918416213/tmp1819996690:org.apache.pig.impl.io.InterStorage)
- scope-40
|
|---a: New For Each(false,false,false,false)[bag] - scope-13
| |
| Cast[chararray] - scope-2
| |
| |---Project[bytearray][0] - scope-1
| |
| Cast[chararray] - scope-5
| |
| |---Project[bytearray][1] - scope-4
| |
| Cast[int] - scope-8
| |
| |---Project[bytearray][2] - scope-7
| |
| Cast[int] - scope-11
| |
| |---Project[bytearray][3] - scope-10
|
|---a: Load(hdfs://zly1.sh.intel.com:8020/user/root/passwd:PigStorage(':'))
- scope-0--------
Spark node scope-45
d:
Store(hdfs://zly1.sh.intel.com:8020/user/root/testMultiQueryJiraPig983_2.out:org.apache.pig.builtin.PigStorage)
- scope-38
|
|---d: New For Each(true,true)[tuple] - scope-37
| |
| Project[bag][1] - scope-35
| |
| Project[bag][2] - scope-36
|
|---d: Package(Packager)[tuple]{chararray} - scope-30
|
|---d: Global Rearrange[tuple] - scope-29
|
|---d: Local Rearrange[tuple]{chararray}(false) - scope-31
| | |
| | Project[chararray][0] - scope-32
| |
| |---b: Filter[bag] - scope-17
| | |
| | Less Than[boolean] - scope-20
| | |
| | |---Project[int][2] - scope-18
| | |
| | |---Constant(5) - scope-19
| |
|
|---Load(hdfs://zly1.sh.intel.com:8020/tmp/temp1918416213/tmp1819996690:org.apache.pig.impl.io.InterStorage)
- scope-41
|
|---d: Local Rearrange[tuple]{chararray}(false) - scope-33
| |
| Project[chararray][0] - scope-34
|
|---c: Filter[bag] - scope-23
| |
| Greater Than or Equal[boolean] - scope-26
| |
| |---Project[int][2] - scope-24
| |
| |---Constant(5) - scope-25
|
|---Load(hdfs://zly1.sh.intel.com:8020/tmp/temp1918416213/tmp1819996690:org.apache.pig.impl.io.InterStorage)
- scope-43--------
after multiquery optimization:
scope-39
#--------------------------------------------------
# Spark Plan
#--------------------------------------------------
Spark node scope-39
d:
Store(hdfs://zly1.sh.intel.com:8020/user/root/testMultiQueryJiraPig983_2.out:org.apache.pig.builtin.PigStorage)
- scope-38
|
|---d: New For Each(true,true)[tuple] - scope-37
| |
| Project[bag][1] - scope-35
| |
| Project[bag][2] - scope-36
|
|---d: Package(Packager)[tuple]{chararray} - scope-30
|
|---d: Global Rearrange[tuple] - scope-29
|
|---d: Local Rearrange[tuple]{chararray}(false) - scope-31
| | |
| | Project[chararray][0] - scope-32
| |
| |---b: Filter[bag] - scope-17
| | |
| | Less Than[boolean] - scope-20
| | |
| | |---Project[int][2] - scope-18
| | |
| | |---Constant(5) - scope-19
| |
| |---a: New For Each(false,false,false,false)[bag] - scope-13
| | |
| | Cast[chararray] - scope-2
| | |
| | |---Project[bytearray][0] - scope-1
| | |
| | Cast[chararray] - scope-5
| | |
| | |---Project[bytearray][1] - scope-4
| | |
| | Cast[int] - scope-8
| | |
| | |---Project[bytearray][2] - scope-7
| | |
| | Cast[int] - scope-11
| | |
| | |---Project[bytearray][3] - scope-10
| |
| |---a:
Load(hdfs://zly1.sh.intel.com:8020/user/root/passwd:PigStorage(':')) - scope-0
|
|---d: Local Rearrange[tuple]{chararray}(false) - scope-33
| |
| Project[chararray][0] - scope-34
|
|---c: Filter[bag] - scope-23
| |
| Greater Than or Equal[boolean] - scope-26
| |
| |---Project[int][2] - scope-24
| |
| |---Constant(5) - scope-25
|
|---a: New For Each(false,false,false,false)[bag] - scope-13
| |
| Cast[chararray] - scope-2
| |
| |---Project[bytearray][0] - scope-1
| |
| Cast[chararray] - scope-5
| |
| |---Project[bytearray][1] - scope-4
| |
| Cast[int] - scope-8
| |
| |---Project[bytearray][2] - scope-7
| |
| Cast[int] - scope-11
| |
| |---Project[bytearray][3] - scope-10
|
|---a:
Load(hdfs://zly1.sh.intel.com:8020/user/root/passwd:PigStorage(':')) -
scope-0--------
{code}
Method *forceConnect* I added in PIG-4594.patch works to connect y and z to x
even though x does not support multi outputs because i *remove* the check
whether the operator supports multiOutputs.
> Enable "TestMultiQuery" in spark mode
> -------------------------------------
>
> Key: PIG-4594
> URL: https://issues.apache.org/jira/browse/PIG-4594
> Project: Pig
> Issue Type: Sub-task
> Components: spark
> Reporter: liyunzhang_intel
> Assignee: liyunzhang_intel
> Fix For: spark-branch
>
> Attachments: PIG-4594-3.patch, PIG-4594.patch, PIG-4594_1.patch,
> PIG-4594_2.patch
>
>
> in https://builds.apache.org/job/Pig-spark/211/#showFailuresLink,it shows
> that
> following unit test failures fail:
> org.apache.pig.test.TestMultiQuery.testMultiQueryJiraPig1068
> org.apache.pig.test.TestMultiQuery.testMultiQueryJiraPig1157
> org.apache.pig.test.TestMultiQuery.testMultiQueryJiraPig1252
> org.apache.pig.test.TestMultiQuery.testMultiQueryJiraPig1438
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)