[ https://issues.apache.org/jira/browse/PIG-2328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13132212#comment-13132212 ]
Alan Gates commented on PIG-2328: --------------------------------- bq. Correct me if I am wrong, but this doesn't work if you use 2 different bloom filters in a single task. Glad you caught that. I'd meant to fix it and forgot. bq. Why "contains" test for jenkins and murmur? The Hadoop names for these are Hash.JENKINS_HASH and Hash.MURMUR_HASH. I assumed people might copy some or all of those strings from the Hadoop docs and use them, and I wanted it to work whether they used "jenkins" "jenkins_hash" or "Hash.JENKINS_HASH" On the definition by number of elements and desired accuracy I agree that would be nice. I may put that in a follow on patch though, we'll see if I can finish it in the next few days. Same on operating directly on a relation. I'll see if I can get it working soon, if not I may do it in a follow on patch. > Add builtin UDFs for building and using bloom filters > ----------------------------------------------------- > > Key: PIG-2328 > URL: https://issues.apache.org/jira/browse/PIG-2328 > Project: Pig > Issue Type: New Feature > Components: internal-udfs > Reporter: Alan Gates > Assignee: Alan Gates > Fix For: 0.10 > > Attachments: PIG-bloom.patch > > > Bloom filters are a common way to do select a limited set of records before > moving data for a join or other heavy weight operation. Pig should add UDFs > to support building and using bloom filters. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira