Ashwin Shankar created PIG-4203:
-----------------------------------
Summary: Implement sparse JOIN on table using bloom filter
Key: PIG-4203
URL: https://issues.apache.org/jira/browse/PIG-4203
Project: Pig
Issue Type: New Feature
Reporter: Ashwin Shankar
Currently when users want to do a join on tables where one of the tables is
sparse(ie only a small percentage of records match during join), they could use
bloom filters to make the make join efficient(See PIG-2328).
However this involves writing some code and calling couple of UDFs -
BuildBloom,Bloom.
It would be great if building of bloom filters in these cases are automatically
done ie Pig automatically inserts them into MR plan when users specify some
keyword.
Calling this keyword "sparse" if no one has any objections.
Eg : C = JOIN A BY a1, B BY b1 USING 'sparse';
Assumption here is that table mentioned on the right side of join is the
smaller table.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)