[jira] [Comment Edited] (PIG-4420) Support for map side cross similar to replicate join

Rohini Palaniswamy (JIRA) Thu, 08 Oct 2015 12:48:52 -0700

    [ 
https://issues.apache.org/jira/browse/PIG-4420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14949255#comment-14949255
 ]


Rohini Palaniswamy edited comment on PIG-4420 at 10/8/15 7:47 PM:
------------------------------------------------------------------

Learnt today from one of our user scripts that there is a shorter syntax.

A = LOAD ...
B = LOAD ...
C = JOIN A BY 1, B BY 1 USING 'replicated'; 


was (Author: rohini):
Learnt today from one of our user scripts that there is a shorter syntax.

A = LOAD ...
B = LOAD ...
E = JOIN C BY 1, D BY 1 USING 'replicated'; 

> Support for map side cross similar to replicate join
> ----------------------------------------------------
>
>                 Key: PIG-4420
>                 URL: https://issues.apache.org/jira/browse/PIG-4420
>             Project: Pig
>          Issue Type: New Feature
>            Reporter: Rohini Palaniswamy
>
>    Our CROSS implementation is very costly.  Recently had a case where a user 
> was doing a CROSS of 30million records against 3K records and it caused lot 
> of disk error exceptions during the shuffle phase. We need to add support for 
> a map side cross syntax
> C = CROSS A, B using 'replicate';
> The smaller table can be loaded in a list (hashmap in replicate join) and 
> iterated through for each record in the bigger table. It should give a major 
> performance boost and drastically reduce the resource usage.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (PIG-4420) Support for map side cross similar to replicate join

Reply via email to