order of relations in replicated join gets switched in a query where first
relation has two mergeable foreach statements
------------------------------------------------------------------------------------------------------------------------
Key: PIG-1672
URL: https://issues.apache.org/jira/browse/PIG-1672
Project: Pig
Issue Type: Bug
Affects Versions: 0.8.0
Reporter: Thejas M Nair
Assignee: Thejas M Nair
Fix For: 0.8.0
The replicated join query was running out of memory because the order of
relations got switched during logical plan optimization and it was attempting
to load the larger (left) relation into memory.
{code}
cat replj.pig
l1 = load 'x' as (a);
l2 = load 'y' as (b);
l3 = load 'z' as (a1,b1,c1,d1);
f1 = foreach l3 generate a1 as a, b1 as b, c1 as c, d1 as d;
f2 = foreach f1 generate a,b,c;
j1 = join f2 by a, l1 by a using 'replicated';
j2 = join j1 by b, l2 by b using 'replicated';
explain j2;
Note that in the MR plan printed below, the Load in the MR job with join
operations has 'x' as the input instead of 'z' .
#--------------------------------------------------
# Map Reduce Plan
#--------------------------------------------------
MapReduce node scope-30
Map Plan
Store(file:/tmp/temp101387354/tmp-125684214:org.apache.pig.impl.io.InterStorage)
- scope-31
|
|---l2:
Load(file:///Users/tejas/pig-0.8/branch-0.8/y:org.apache.pig.builtin.PigStorage)
- scope-17--------
Global sort: false
----------------
MapReduce node scope-27
Map Plan
j2: Store(fakefile:org.apache.pig.builtin.PigStorage) - scope-26
|
|---j2: FRJoin[tuple] - scope-20
| |
| Project[bytearray][1] - scope-18
| |
| Project[bytearray][0] - scope-19
|
|---j1: FRJoin[tuple] - scope-11
| |
| Project[bytearray][0] - scope-9
| |
| Project[bytearray][0] - scope-10
|
|---l1:
Load(file:///Users/tejas/pig-0.8/branch-0.8/x:org.apache.pig.builtin.PigStorage)
- scope-0--------
Global sort: false
----------------
MapReduce node scope-28
Map Plan
Store(file:/tmp/temp101387354/tmp-890864787:org.apache.pig.impl.io.InterStorage)
- scope-29
|
|---f2: New For Each(false,false,false)[bag] - scope-8
| |
| Project[bytearray][0] - scope-2
| |
| Project[bytearray][1] - scope-4
| |
| Project[bytearray][2] - scope-6
|
|---l3:
Load(file:///Users/tejas/pig-0.8/branch-0.8/z:org.apache.pig.builtin.PigStorage)
- scope-1--------
Global sort: false
----------------
{code}
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.