Hello,

> From: Mork0075 <[EMAIL PROTECTED]>
> Subject: Re: Why is scaling HBase much simpler then scaling a relational db?
> To: hbase-user@hadoop.apache.org
> Date: Wednesday, August 27, 2008, 12:57 AM
> > Can you please provide an example of "good
> > de-normalization" in HBase and how its held
> > consistent

I explain it to colleagues as "insert time joins". If
your query is going to pull data items {x,y,z} then
you should duplicate/store {x,y,z} next to each other
such that they will all be pulled off of disk -- from
one location only -- to satisfy the query. This could
be either in the same row of the same table (good), or
in the same column family of the same table (better). 

   table t:
     column c:

   insert using qualifiers 'c:x', 'c:y', and 'c:z'

> Our webapp doenst use joins at the moment anyway.

So then I assume your schema is normalized, maybe even
third normal form, and/or uses secondary indexes then,
or tertiary indexes etc.? 

> > As you describe it, its a problem of implementation.
[...]
> > Could MySQL perhaps decide tomorrow to implement
> > something similar or does the relational model
> > avoids this?

Bigtable (therefore HBase) is a response to the
particulars of very very large databases, especially as
relates to the capabilities and limitations of today's
storage hardware. Sharding a relational database in
comparison is a hack and has drawbacks already
explained in this thread. 

With that said there is nothing that says that someone
could not layer at least some subset of the relational
model and SQL support on top of the bigtable storage
model. I think Vertica does this. If MySQL decided to do
this one day that would be great. However the work
involved is substantial in my estimation. Vertica is not
giving away free product after doing that hard work...

Hope this helps,

    - Andy



      

Reply via email to