[
https://issues.apache.org/jira/browse/PIG-1672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Thejas M Nair updated PIG-1672:
-------------------------------
Attachment: PIG-1672.1.patch
PIG-1672.1.patch
Fixed the way new merged foreach gets added after two foreach statements get
merged. I will upload another patch with test cases.
> order of relations in replicated join gets switched in a query where first
> relation has two mergeable foreach statements
> ------------------------------------------------------------------------------------------------------------------------
>
> Key: PIG-1672
> URL: https://issues.apache.org/jira/browse/PIG-1672
> Project: Pig
> Issue Type: Bug
> Affects Versions: 0.8.0
> Reporter: Thejas M Nair
> Assignee: Thejas M Nair
> Fix For: 0.8.0
>
> Attachments: PIG-1672.1.patch
>
>
> The replicated join query was running out of memory because the order of
> relations got switched during logical plan optimization and it was attempting
> to load the larger (left) relation into memory.
> {code}
> cat replj.pig
> l1 = load 'x' as (a);
> l2 = load 'y' as (b);
> l3 = load 'z' as (a1,b1,c1,d1);
> f1 = foreach l3 generate a1 as a, b1 as b, c1 as c, d1 as d;
> f2 = foreach f1 generate a,b,c;
> j1 = join f2 by a, l1 by a using 'replicated';
> j2 = join j1 by b, l2 by b using 'replicated';
> explain j2;
> Note that in the MR plan printed below, the Load in the MR job with join
> operations has 'x' as the input instead of 'z' .
> #--------------------------------------------------
> # Map Reduce Plan
> #--------------------------------------------------
> MapReduce node scope-30
> Map Plan
> Store(file:/tmp/temp101387354/tmp-125684214:org.apache.pig.impl.io.InterStorage)
> - scope-31
> |
> |---l2:
> Load(file:///Users/tejas/pig-0.8/branch-0.8/y:org.apache.pig.builtin.PigStorage)
> - scope-17--------
> Global sort: false
> ----------------
> MapReduce node scope-27
> Map Plan
> j2: Store(fakefile:org.apache.pig.builtin.PigStorage) - scope-26
> |
> |---j2: FRJoin[tuple] - scope-20
> | |
> | Project[bytearray][1] - scope-18
> | |
> | Project[bytearray][0] - scope-19
> |
> |---j1: FRJoin[tuple] - scope-11
> | |
> | Project[bytearray][0] - scope-9
> | |
> | Project[bytearray][0] - scope-10
> |
> |---l1:
> Load(file:///Users/tejas/pig-0.8/branch-0.8/x:org.apache.pig.builtin.PigStorage)
> - scope-0--------
> Global sort: false
> ----------------
> MapReduce node scope-28
> Map Plan
> Store(file:/tmp/temp101387354/tmp-890864787:org.apache.pig.impl.io.InterStorage)
> - scope-29
> |
> |---f2: New For Each(false,false,false)[bag] - scope-8
> | |
> | Project[bytearray][0] - scope-2
> | |
> | Project[bytearray][1] - scope-4
> | |
> | Project[bytearray][2] - scope-6
> |
> |---l3:
> Load(file:///Users/tejas/pig-0.8/branch-0.8/z:org.apache.pig.builtin.PigStorage)
> - scope-1--------
> Global sort: false
> ----------------
> {code}
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.