order of relations in replicated join gets switched in a query where first 
relation has two mergeable foreach statements
------------------------------------------------------------------------------------------------------------------------

                 Key: PIG-1672
                 URL: https://issues.apache.org/jira/browse/PIG-1672
             Project: Pig
          Issue Type: Bug
    Affects Versions: 0.8.0
            Reporter: Thejas M Nair
            Assignee: Thejas M Nair
             Fix For: 0.8.0


The replicated join query was running out of memory because the order of 
relations got switched during logical plan optimization and it was attempting 
to load the larger (left) relation into memory.

{code}
cat replj.pig
l1 = load 'x' as (a);
l2 = load 'y' as (b);
l3 = load 'z' as (a1,b1,c1,d1);
f1 = foreach l3 generate a1 as a, b1 as b, c1 as c, d1 as d;
f2 = foreach f1 generate a,b,c; 
j1 = join f2 by a, l1 by a using 'replicated';
j2 = join j1 by b, l2 by b using 'replicated';
explain j2;


Note that in the MR plan printed below, the Load in the MR job with join 
operations has 'x' as the input instead of 'z' .

#--------------------------------------------------
# Map Reduce Plan                                  
#--------------------------------------------------
MapReduce node scope-30
Map Plan
Store(file:/tmp/temp101387354/tmp-125684214:org.apache.pig.impl.io.InterStorage)
 - scope-31
|
|---l2: 
Load(file:///Users/tejas/pig-0.8/branch-0.8/y:org.apache.pig.builtin.PigStorage)
 - scope-17--------
Global sort: false
----------------

MapReduce node scope-27
Map Plan
j2: Store(fakefile:org.apache.pig.builtin.PigStorage) - scope-26
|
|---j2: FRJoin[tuple] - scope-20
    |   |
    |   Project[bytearray][1] - scope-18
    |   |
    |   Project[bytearray][0] - scope-19
    |
    |---j1: FRJoin[tuple] - scope-11
        |   |
        |   Project[bytearray][0] - scope-9
        |   |
        |   Project[bytearray][0] - scope-10
        |
        |---l1: 
Load(file:///Users/tejas/pig-0.8/branch-0.8/x:org.apache.pig.builtin.PigStorage)
 - scope-0--------
Global sort: false
----------------

MapReduce node scope-28
Map Plan
Store(file:/tmp/temp101387354/tmp-890864787:org.apache.pig.impl.io.InterStorage)
 - scope-29
|
|---f2: New For Each(false,false,false)[bag] - scope-8
    |   |
    |   Project[bytearray][0] - scope-2
    |   |
    |   Project[bytearray][1] - scope-4
    |   |
    |   Project[bytearray][2] - scope-6
    |
    |---l3: 
Load(file:///Users/tejas/pig-0.8/branch-0.8/z:org.apache.pig.builtin.PigStorage)
 - scope-1--------
Global sort: false
----------------

{code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to