Re: Is HBase suitable for ...

Bryan Duxbury Mon, 28 Apr 2008 20:42:23 -0700

My replies and questions inline.

On Apr 28, 2008, at 2:57 PM, Max Grigoriev wrote:

Hi there,

I'm making research to find right solution for our needs.
We need persistent layer for groups of social network.
These groups will have big amount of data ( ~100 GB) - usersprofiles, their
activities and etc.

100GB per group, or 100GB overall? How many groups?

And all job with these entities should be make online - user canask to
unsubscribe him, or connect another users to him.
So we'll work with small pieces of big dataset not big data inoffline -
like log parser.
We wants to have ability to make search of different tableattributes and of
course scalability and failover.

What kind of search on different table attributes do you want to do?There are no general purpose secondary indexes in HBase, so youeither have to do a full- or partial-table scan or put the searchattribute in the primary key.

As far as failover, at the moment, HBase has good recovery for regionservers, and no recovery for the master. That's something we'rehoping to change in the future.

We need easy add/remove nodes in cluster without stopping entiresystem.

You can do this, and it's not that hard.

All of this can be done with Amazon SimpleDB but we don't want todepend on

external service. That's why we're looking for some 3d product.

We have such candidates:

   - HBase -
   - CouchDb
   - HyperTable
   - Own bicycle

Can you tell me is HBase will work for such system?

I think HBase can do what you need, but it'd be nice to have moredetails about what exactly you're going to do with it.

If we have 2 or 3 data centers and we loose connection between them- what
behavior of HBase will we see ?

Is your intent to run a single HBase instance across several datacenters? At the moment, if a regionserver is cut off from the master,it will kill itself. This means that if you have your master at onelocation and regionservers at another, and you lose connectivity,your regionservers at the other locations will shut themselves down.There are solutions to this we've discussed in the past. However, Iwonder if maybe the correct solution is not to partition across datacenters. It's not something that we've discussed at great length yet,so there might be an easier way to do it than I'm thinking.

And when we restore connection in 1-2 hours - what should we expectfrom
HBase ?

This is where things would get sticky - how do you resolve conflictsin how data is being served, or worse, how it was split into regions?It seems inherently complicated and unpleasant.



Thank you.

Re: Is HBase suitable for ...

Reply via email to