Sorry, I hit the send button before finishing the message.

I am building a data cube on top of HBase. All access to the data is by 
map/reduce jobs. I want to build a scanner where its first matching criteria is 
based on the set intersection of bloom filters, followed by additional matching 
criteria specified in the current filter architecture. First, I run a 
map/reduce job on table A. For every row I match in table A, I add the row key 
to a bloom filter. I then do a map/reduce job on table B, where the row keys 
are over the same domain as table A. I want to build a scanner that can use the 
builtin Bloom filters in HBase. When the scanner goes to get the block of data 
to which a row key based bloom filter is attached, it does a set intersection 
with the table A bloom filter to see if any of the keys from Table A are in the 
block. If so, the block is read in and the the scanner does addition matching 
on the rows according to the filter.

This is a simplification of my problem. I am trying to find out what the 
complexity of implementing such a feature would be in HBase.
-----------------
Sincerely,
David G. Boney
Chair, Austin ACM SIGKDD
[email protected]
http://www.meetup.com/Austin-ACM-SIGKDD/
http://tech.groups.yahoo.com/group/austinsigkdd/

Reply via email to