Thanks a lot for all replies, this is really helpful.

As you describe it, its a problem of implementation. BigTable is designed to scale, there are routines to shard the data, desitribute it to the pool of connected servers. Could MySQL perhaps decide tomorrow to implement something similar or does the relational model avoids this?

As i can see up to here, scalability in HBase is for free, where there's no adaquate pre-implemented solution for a (free) relational database software.


Fernando Padilla schrieb:
I'm no expert, but maybe I can explain it the way I see it, maybe it will resonate with other newbies like me :) Sorry if it's long winded, or boring for those who already know all this.


BigTable and Hadoop are inherently sharded and distributed. They are architected to store the data in redundant shards across many machines. This allows you to add more capacity ( both in processing band-width as well as storage capacity ), simply by adding more machines to host your cluster.

Mysql is implemented around the idea of a single database running on a single machine. This isn't inherently bad, as long as you have a machine with large enough storage and processing bandwidth ( memory/cpu ). You can implement sharding over mysql (and other databases), but the application will have to be hand tailored to work in such a setup. You're essentially implemented a home-grown data storage system, from a collection of regular databases.

You mentioned the Mysql Cluster technology; which is an attempt to bring in Mysql native level sharding and distribution of processing bandwidth. But at least from their early architecture (I have not kept up with any evolution of it), it was not yet close to real sharding nor full scalability. The technology could be described as a collection of Master-Master In-Memory database nodes backed by a collection of persistence nodes. So that it worked great as long as you had enough ram to hold your whole database in a single node.




Every system has their pros and cons.

A single Mysql is simpler and solid and people have lots of experience with it. If your application allows, and with some application specific architecting, you could shard mysql to some high level of scalability. But this really depends on your application data and queries you do over that data. Or how much extra engineering you're willing to do (and maintain) to queries that span multiple shards run efficiently.

BigTable and Hadoop are implemented to support sharding and distributed queries from the get-go, so you can easily scale out without having to add or maintain more complex or homegrown software, just more hardware.




Mork0075 wrote:
Thank you, but i still don't got it.

I've read tons of websites and papers, but there's no clear und founded answer "why use BigTable instead of relational databases".

MySQL Cluster seams to offer the same scalabilty and level of abstraction, whithout switching to a non relational pardigm. Lots of blog posts are highly emotional, without answering the core question:

"Why RDBMS don't scale and why something like BigTable do". Often you read something like this:

"They have also built a system called BigTable, which is a Column Oriented Database, which splits a table into columns rather than rows making is much simpler to distribute and parallelize."

Why?

Really confusing ... ;)

Stuart Sierra schrieb:
On Tue, Aug 19, 2008 at 9:44 AM, Mork0075 <[EMAIL PROTECTED]> wrote:
Can you please explain, why someone should use HBase for horizontal
scaling instead of a relational database? One reason for me would be,
that i don't have to implement the sharding logic myself. Are there other?

A slight tangent -- there are various tools that implement sharding
over relational databases like MySQL.  Two that I know of are
DBSlayer,
http://code.nytimes.com/projects/dbslayer
and MySQL Proxy,
http://forge.mysql.com/wiki/MySQL_Proxy

I don't know of any formal comparisons between sharding traditional
database servers and distributed databases like HBase.
-Stuart




Reply via email to