[
https://issues.apache.org/jira/browse/PIG-4420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14320171#comment-14320171
]
Brian Johnson commented on PIG-4420:
------------------------------------
You can 'fake' a replicated cross join
A = LOAD ...
B = LOAD ...
C = FOREACH A GENERATE *, 1 AS key:int
D = FOREACH B GENERATE *, 1 AS key:int
E = JOIN C BY key, D BY key USING 'replicated'
> Support for map side cross similar to replicate join
> ----------------------------------------------------
>
> Key: PIG-4420
> URL: https://issues.apache.org/jira/browse/PIG-4420
> Project: Pig
> Issue Type: New Feature
> Reporter: Rohini Palaniswamy
>
> Our CROSS implementation is very costly. Recently had a case where a user
> was doing a CROSS of 30million records against 3K records and it caused lot
> of disk error exceptions during the shuffle phase. We need to add support for
> a map side cross syntax
> C = CROSS A, B using 'replicate';
> The smaller table can be loaded in a list (hashmap in replicate join) and
> iterated through for each record in the bigger table. It should give a major
> performance boost and drastically reduce the resource usage.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)