[ https://issues.apache.org/jira/browse/PIG-2328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13136914#comment-13136914 ]
Dmitriy V. Ryaboy commented on PIG-2328: ---------------------------------------- This looks great. My only remaining issue is that you are getting around the multiple bloom filters issue by grabbing the filename (dirname, really) of the bloom filter we are loading, while it may be the same. For example, it's not unreasonable to load "mydataset/sellers/bloom" and "/mydataset/buyers/bloom" . Perhaps a simple replacement of "/" with "_" would be better? > Add builtin UDFs for building and using bloom filters > ----------------------------------------------------- > > Key: PIG-2328 > URL: https://issues.apache.org/jira/browse/PIG-2328 > Project: Pig > Issue Type: New Feature > Components: internal-udfs > Reporter: Alan Gates > Assignee: Alan Gates > Fix For: 0.10 > > Attachments: PIG-bloom-2.patch, PIG-bloom.patch > > > Bloom filters are a common way to do select a limited set of records before > moving data for a join or other heavy weight operation. Pig should add UDFs > to support building and using bloom filters. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira