Rohini Palaniswamy created PIG-4420:
---------------------------------------
Summary: Support for map side cross similar to replicate join
Key: PIG-4420
URL: https://issues.apache.org/jira/browse/PIG-4420
Project: Pig
Issue Type: New Feature
Reporter: Rohini Palaniswamy
Our CROSS implementation is very costly. Recently had a case where a user
was doing a CROSS of 30million records against 3K records and it caused lot of
disk error exceptions during the shuffle phase. We need to add support for a
map side cross syntax
C = CROSS A, B using 'replicate';
The smaller table can be loaded in a list (hashmap in replicate join) and
iterated through for each record in the bigger table. It should give a major
performance boost and drastically reduce the resource usage.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)