Ashwin Shankar created PIG-4203:
-----------------------------------

             Summary: Implement sparse JOIN on table using bloom filter
                 Key: PIG-4203
                 URL: https://issues.apache.org/jira/browse/PIG-4203
             Project: Pig
          Issue Type: New Feature
            Reporter: Ashwin Shankar


Currently when users want to do a join on tables where one of the tables is 
sparse(ie only a small percentage of records match during join), they could use 
bloom filters to make the make join efficient(See PIG-2328).
However this involves writing some code and calling couple of UDFs - 
BuildBloom,Bloom. 
It would be great if building of bloom filters in these cases are automatically 
done ie Pig automatically inserts them into MR plan when users specify some 
keyword.
Calling this keyword "sparse" if no one has any objections.
Eg : C = JOIN A BY a1, B BY b1 USING 'sparse';  

Assumption here is that table mentioned on the right side of join is the 
smaller table.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to