[jira] [Commented] (PIG-4420) Support for map side cross similar to replicate join

Brian Johnson (JIRA) Fri, 13 Feb 2015 12:31:50 -0800

    [ 
https://issues.apache.org/jira/browse/PIG-4420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14320704#comment-14320704
 ]


Brian Johnson commented on PIG-4420:
------------------------------------

that seems to depend on how you are using the cross join output. In our case, 
we are joining it to another relation. We might be able to restructure that, 
but the other downside to this approach is that it triggers a reduce with 
'group stations all' where the replicated join doesn't

> Support for map side cross similar to replicate join
> ----------------------------------------------------
>
>                 Key: PIG-4420
>                 URL: https://issues.apache.org/jira/browse/PIG-4420
>             Project: Pig
>          Issue Type: New Feature
>            Reporter: Rohini Palaniswamy
>
>    Our CROSS implementation is very costly.  Recently had a case where a user 
> was doing a CROSS of 30million records against 3K records and it caused lot 
> of disk error exceptions during the shuffle phase. We need to add support for 
> a map side cross syntax
> C = CROSS A, B using 'replicate';
> The smaller table can be loaded in a list (hashmap in replicate join) and 
> iterated through for each record in the bigger table. It should give a major 
> performance boost and drastically reduce the resource usage.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (PIG-4420) Support for map side cross similar to replicate join

Reply via email to