Hello, > From: Mork0075 <[EMAIL PROTECTED]> > Subject: Re: Why is scaling HBase much simpler then scaling a relational db? > To: hbase-user@hadoop.apache.org > Date: Wednesday, August 27, 2008, 12:57 AM > > Can you please provide an example of "good > > de-normalization" in HBase and how its held > > consistent
I explain it to colleagues as "insert time joins". If your query is going to pull data items {x,y,z} then you should duplicate/store {x,y,z} next to each other such that they will all be pulled off of disk -- from one location only -- to satisfy the query. This could be either in the same row of the same table (good), or in the same column family of the same table (better). table t: column c: insert using qualifiers 'c:x', 'c:y', and 'c:z' > Our webapp doenst use joins at the moment anyway. So then I assume your schema is normalized, maybe even third normal form, and/or uses secondary indexes then, or tertiary indexes etc.? > > As you describe it, its a problem of implementation. [...] > > Could MySQL perhaps decide tomorrow to implement > > something similar or does the relational model > > avoids this? Bigtable (therefore HBase) is a response to the particulars of very very large databases, especially as relates to the capabilities and limitations of today's storage hardware. Sharding a relational database in comparison is a hack and has drawbacks already explained in this thread. With that said there is nothing that says that someone could not layer at least some subset of the relational model and SQL support on top of the bigtable storage model. I think Vertica does this. If MySQL decided to do this one day that would be great. However the work involved is substantial in my estimation. Vertica is not giving away free product after doing that hard work... Hope this helps, - Andy