[ https://issues.apache.org/jira/browse/PIG-2328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13139536#comment-13139536 ]
Daniel Dai commented on PIG-2328: --------------------------------- Here are some comments: 1. javadoc sample is wrong: define bb BuildBloom(100, 3, Hash.JENKINS_HASH); => define bb BuildBloom('jenkins', '100', '0.1'); C = filter B by Bloom(mybloom, z); =>C = filter B by Bloom(z); 2. It should be trivial to convert it into scalar, so that we get out of the business to figure out the symbol link name: {code} define bb BuildBloom('jenkins', '10', '0.1'); small = load 'S' as (x, y, z); grpd = group small all; fltrd = foreach grpd generate bb(small.x) as a0; large = load 'L' as (a, b, c); flarge = filter large by Bloom(fltrd.a0, a); joined = join small by x, flarge by a; store joined into 'results'; {code} Wanna me to upload a patch? > Add builtin UDFs for building and using bloom filters > ----------------------------------------------------- > > Key: PIG-2328 > URL: https://issues.apache.org/jira/browse/PIG-2328 > Project: Pig > Issue Type: New Feature > Components: internal-udfs > Reporter: Alan Gates > Assignee: Alan Gates > Fix For: 0.10 > > Attachments: PIG-bloom-2.patch, PIG-bloom.patch > > > Bloom filters are a common way to do select a limited set of records before > moving data for a join or other heavy weight operation. Pig should add UDFs > to support building and using bloom filters. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira