[ 
https://issues.apache.org/jira/browse/PIG-1672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated PIG-1672:
-------------------------------

    Attachment: PIG-1672.2.patch

PIG-1672.2.patch
Added test case.
I have also made minor changes to reduce the memory footprint of the right 
table, which should reduce the overhead for every record in the table by up to 
80 bytes. In POFRJoin.java, now the ArrayList that stores multiple values for a 
key is initialized with 1, the 'value' tuple in the hashmap is initialized with 
expected number of fields.




> order of relations in replicated join gets switched in a query where first 
> relation has two mergeable foreach statements
> ------------------------------------------------------------------------------------------------------------------------
>
>                 Key: PIG-1672
>                 URL: https://issues.apache.org/jira/browse/PIG-1672
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.8.0
>            Reporter: Thejas M Nair
>            Assignee: Thejas M Nair
>             Fix For: 0.8.0
>
>         Attachments: PIG-1672.1.patch, PIG-1672.2.patch
>
>
> The replicated join query was running out of memory because the order of 
> relations got switched during logical plan optimization and it was attempting 
> to load the larger (left) relation into memory.
> {code}
> cat replj.pig
> l1 = load 'x' as (a);
> l2 = load 'y' as (b);
> l3 = load 'z' as (a1,b1,c1,d1);
> f1 = foreach l3 generate a1 as a, b1 as b, c1 as c, d1 as d;
> f2 = foreach f1 generate a,b,c; 
> j1 = join f2 by a, l1 by a using 'replicated';
> j2 = join j1 by b, l2 by b using 'replicated';
> explain j2;
> Note that in the MR plan printed below, the Load in the MR job with join 
> operations has 'x' as the input instead of 'z' .
> #--------------------------------------------------
> # Map Reduce Plan                                  
> #--------------------------------------------------
> MapReduce node scope-30
> Map Plan
> Store(file:/tmp/temp101387354/tmp-125684214:org.apache.pig.impl.io.InterStorage)
>  - scope-31
> |
> |---l2: 
> Load(file:///Users/tejas/pig-0.8/branch-0.8/y:org.apache.pig.builtin.PigStorage)
>  - scope-17--------
> Global sort: false
> ----------------
> MapReduce node scope-27
> Map Plan
> j2: Store(fakefile:org.apache.pig.builtin.PigStorage) - scope-26
> |
> |---j2: FRJoin[tuple] - scope-20
>     |   |
>     |   Project[bytearray][1] - scope-18
>     |   |
>     |   Project[bytearray][0] - scope-19
>     |
>     |---j1: FRJoin[tuple] - scope-11
>         |   |
>         |   Project[bytearray][0] - scope-9
>         |   |
>         |   Project[bytearray][0] - scope-10
>         |
>         |---l1: 
> Load(file:///Users/tejas/pig-0.8/branch-0.8/x:org.apache.pig.builtin.PigStorage)
>  - scope-0--------
> Global sort: false
> ----------------
> MapReduce node scope-28
> Map Plan
> Store(file:/tmp/temp101387354/tmp-890864787:org.apache.pig.impl.io.InterStorage)
>  - scope-29
> |
> |---f2: New For Each(false,false,false)[bag] - scope-8
>     |   |
>     |   Project[bytearray][0] - scope-2
>     |   |
>     |   Project[bytearray][1] - scope-4
>     |   |
>     |   Project[bytearray][2] - scope-6
>     |
>     |---l3: 
> Load(file:///Users/tejas/pig-0.8/branch-0.8/z:org.apache.pig.builtin.PigStorage)
>  - scope-1--------
> Global sort: false
> ----------------
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to