Re: regarding datamodel inside HBase

Charles Mason Tue, 13 Jan 2009 04:03:38 -0800

On Tue, Jan 13, 2009 at 11:06 AM, shiraz memon
<[email protected]> wrote:
> Hi,
>
> I am new to HBase and found it very interesting in terms of fulfilling
> scalibility requirements of an enterprise. While browsing documentation,
> unfortunately, I found incomplete article on HBase data model, which can be
> viewed under http://wiki.apache.org/hadoop/Hbase/DataModel. Do you have any
> other info on relational/bigtable conceptual mapping or plans to complete
> it? I know that hbase is not meant for substituting current rdbms's but can
> be a good start for the hbase beginners who also have some background of
> relational models like me. Sorry if I am getting too long.


Well the HBase data model is extremely simple, that Wiki page is
accurate. The best thing to do as an introduction is too read the
Google Big Table paper. It introduces the basic concepts of this type
of DB, its strengths and weaknesses.

You have to remember HBase has no understanding of whats stored inside
each column, its all just binary data to it. The only thing it
understands is that it sorts the row keys alphabetically. It has no
concept of data types.

Basically you loose much the sanity checks a relation DB provides as
well as enforcing the relationships via things like triggers. This
simplicity is what allows it to work so effectively across a cluster.
Virtually everything it does can be done with access to a small subset
of the DB a conventional relational DB is designed with expectation
its costs very little to access any part of the database, which is
fine until you start to cluster it.

Some of that logic can be added back client side. For my project I
have developed a wrapper (like a weakly relational ORM) to handle the
interactions with the HBase. I am going to be open sourcing it soon.

For example you can use extra tables to table to provide additional
column indexing but you need to make sure each client update also
updates the index. This isn't too bad if everything access it via the
wrapper which handles this automatically, leaving it up to each
developer to remember is far to risky and ultimately wastes lots of
time.

HBase is a great solution if you need the scalability or expect to
need it. The problem is it can be a fair amount of work to make an
existing system designed around a traditional relational DB work with
it, as there usage and data models are so different. With our project
we have made a conscious choice to use this type of DB because we feel
MySQL would struggle under our potential load, we have done it early
in the development cycle as we can see making the switch latter would
massively more work as the two DB models differ so much.

Hope that helps a bit.

Charlie M

Re: regarding datamodel inside HBase

Reply via email to