[
https://issues.apache.org/jira/browse/PIG-4203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ashwin Shankar updated PIG-4203:
--------------------------------
Summary: Implement sparse JOIN on tables using bloom filter (was:
Implement sparse JOIN on table using bloom filter)
> Implement sparse JOIN on tables using bloom filter
> --------------------------------------------------
>
> Key: PIG-4203
> URL: https://issues.apache.org/jira/browse/PIG-4203
> Project: Pig
> Issue Type: New Feature
> Reporter: Ashwin Shankar
>
> Currently when users want to do a join on tables where one of the tables is
> sparse(ie only a small percentage of records match during join), they could
> use bloom filters to make the make join efficient(See PIG-2328).
> However this involves writing some code and calling couple of UDFs -
> BuildBloom,Bloom.
> It would be great if building of bloom filters in these cases are
> automatically done ie Pig automatically inserts them into MR plan when users
> specify some keyword.
> Calling this keyword "sparse" if no one has any objections.
> Eg : C = JOIN A BY a1, B BY b1 USING 'sparse';
> Assumption here is that table mentioned on the right side of join is the
> smaller table.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)