[ 
https://issues.apache.org/jira/browse/PIG-4203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashwin Shankar updated PIG-4203:
--------------------------------
    Summary: Implement sparse JOIN on tables using bloom filter  (was: 
Implement sparse JOIN on table using bloom filter)

> Implement sparse JOIN on tables using bloom filter
> --------------------------------------------------
>
>                 Key: PIG-4203
>                 URL: https://issues.apache.org/jira/browse/PIG-4203
>             Project: Pig
>          Issue Type: New Feature
>            Reporter: Ashwin Shankar
>
> Currently when users want to do a join on tables where one of the tables is 
> sparse(ie only a small percentage of records match during join), they could 
> use bloom filters to make the make join efficient(See PIG-2328).
> However this involves writing some code and calling couple of UDFs - 
> BuildBloom,Bloom. 
> It would be great if building of bloom filters in these cases are 
> automatically done ie Pig automatically inserts them into MR plan when users 
> specify some keyword.
> Calling this keyword "sparse" if no one has any objections.
> Eg : C = JOIN A BY a1, B BY b1 USING 'sparse';  
> Assumption here is that table mentioned on the right side of join is the 
> smaller table.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to