[ 
https://issues.apache.org/jira/browse/PIG-2328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13132212#comment-13132212
 ] 

Alan Gates commented on PIG-2328:
---------------------------------

bq.  Correct me if I am wrong, but this doesn't work if you use 2 different 
bloom filters in a single task.
Glad you caught that.  I'd meant to fix it and forgot.

bq. Why "contains" test for jenkins and murmur?
The Hadoop names for these are Hash.JENKINS_HASH and Hash.MURMUR_HASH.  I 
assumed people might copy some or all of those strings from the Hadoop docs and 
use them, and I wanted it to work whether they used "jenkins" "jenkins_hash" or 
"Hash.JENKINS_HASH"

On the definition by number of elements and desired accuracy I agree that would 
be nice.  I may put that in a follow on patch though, we'll see if I can finish 
it in the next few days.  

Same on operating directly on a relation.  I'll see if I can get it working 
soon, if not I may do it in a follow on patch.
                
> Add builtin UDFs for building and using bloom filters
> -----------------------------------------------------
>
>                 Key: PIG-2328
>                 URL: https://issues.apache.org/jira/browse/PIG-2328
>             Project: Pig
>          Issue Type: New Feature
>          Components: internal-udfs
>            Reporter: Alan Gates
>            Assignee: Alan Gates
>             Fix For: 0.10
>
>         Attachments: PIG-bloom.patch
>
>
> Bloom filters are a common way to do select a limited set of records before 
> moving data for a join or other heavy weight operation.  Pig should add UDFs 
> to support building and using bloom filters.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to