Sorry, I hit the send button before finishing the message. I am building a data cube on top of HBase. All access to the data is by map/reduce jobs. I want to build a scanner where its first matching criteria is based on the set intersection of bloom filters, followed by additional matching criteria specified in the current filter architecture. First, I run a map/reduce job on table A. For every row I match in table A, I add the row key to a bloom filter. I then do a map/reduce job on table B, where the row keys are over the same domain as table A. I want to build a scanner that can use the builtin Bloom filters in HBase. When the scanner goes to get the block of data to which a row key based bloom filter is attached, it does a set intersection with the table A bloom filter to see if any of the keys from Table A are in the block. If so, the block is read in and the the scanner does addition matching on the rows according to the filter.
This is a simplification of my problem. I am trying to find out what the complexity of implementing such a feature would be in HBase. ----------------- Sincerely, David G. Boney Chair, Austin ACM SIGKDD [email protected] http://www.meetup.com/Austin-ACM-SIGKDD/ http://tech.groups.yahoo.com/group/austinsigkdd/
