Comments inline: > -----Original Message----- > From: Mork0075 [mailto:[EMAIL PROTECTED] > Sent: Thursday, August 21, 2008 8:48 AM > To: [EMAIL PROTECTED]; hbase-user@hadoop.apache.org > Subject: Re: Why is scaling HBase much simpler then scaling a relational db? > > Thank you, but i still don't got it. > > I've read tons of websites and papers, but there's no clear und founded > answer "why use BigTable instead of relational databases". > > MySQL Cluster seams to offer the same scalabilty and level of > abstraction, whithout switching to a non relational pardigm. Lots of > blog posts are highly emotional, without answering the core question:
I think you'd find that when the size of your data approaches 10-100 TB, you'd find that relational databases run out of gas. Further, as your data grows, with a relational database you need to add another shard, redistribute your data and make the client know that rows are split over n+1 shards instead of n. Bigtable has shown that it can scale to 100s of TB of data (or even more - I don't have any recent numbers on the largest Bigtable instance. All this can be done by just bringing up a new server and data is redistributed automatically, and client applications do not need to be changed. > "Why RDBMS don't scale and why something like BigTable do". Often you > read something like this: > > "They have also built a system called BigTable, which is a Column > Oriented Database, which splits a table into columns rather than rows > making is much simpler to distribute and parallelize." > > Why? In a column oriented data store, nulls are free. Not so for a row oriented database, where it must allocate space for a column even if the current value is null.