Rohini Palaniswamy created PIG-5260:
---------------------------------------

             Summary: Separate bloom filter for each reducer of the join
                 Key: PIG-5260
                 URL: https://issues.apache.org/jira/browse/PIG-5260
             Project: Pig
          Issue Type: New Feature
            Reporter: Rohini Palaniswamy


   Currently bloom join allows specifying the number of bloom filters and all 
of them are broadcast to each join vertex. The bloom filter partition logic is 
joinkey hashcode % num_filters. The reducer partition logic is joinkey hashcode 
% num_reducers. If we made the number of bloom filters equal to number of 
reducers in the join we can just broadcast bloom filter  0 to reducer 0, bloom 
filter 1 to reducer 1 and so on. one-one edge will most likely prevent 
auto-reduce parallelism from being applied for the scatter-gather edge. So need 
to see if we need a custom one-one broadcast edge for this.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to