Hi All,

I am evaluating HBase and I am not sure if our
use-case fits naturally with HBase’s capabilities.  I
would appreciate any help.

We would like to store a large number (billions) of
rows in HBase using a key field to access the values. 
We will then need to continually add, update, and
delete rows.  This is our master table.  What I
describe here naturally fits into what HBase is
designed to do.

It’s this next part that I’m having trouble finding
documentation for.

We would like to use HBase’s parallel processing
capabilities to periodically spawn off other temporary
tables when requested.  We would like to take the
first table (the master table), go through the key and
field values in its rows.  From this, we would like to
create a second table organized differently from the
master table.  We would also need to include count,
max, min, and other things specific to the particular
request. 

This seems like textbook map-reduce functionality, but
I don’t see too much in HBase referencing this kind of
setup.  Also there is a reference in HBase’s 10 minute
startup guide that states “[HBase doesn’t] need
mapreduce”.

I suppose we could use HBase as an input and output to
Hadoop's map reduce functionality.  If we did that,
what would guarantee that we were mapping to local
data?

Any help would be greatly appreciated.  If you have a
reference to a previous discussion or document I could
read, that would be appreciated as well.

-FA



      
____________________________________________________________________________________
Never miss a thing.  Make Yahoo your home page. 
http://www.yahoo.com/r/hs

Reply via email to