They have aspects in common -- java, datastores, apache -- but the differences are pretty acute:
+ Cassandra does eventual consistency. HBase does strong consistency. See http://devblog.streamy.com/2009/08/24/cap-theorem/ for more on this. + Cassandra does not do BigTable cell versions. It only keeps the latest. In HBase you can have as many versions as you want. + Cassandra underpinnings are based on AMZ Dynamo (keys are distributed/replicated in buckets spread over a consistent hashing unit circle, etc. Apparently there is means of ordering keys around the circle but I don't know much about this). HBase chassis tries to be that described in the BT paper. + Because Cassandra has the above underpinnings, it purportedly can span data centers. HBase has no such facility currently (In 0.21, HBase will have a replication facility) + Cassandra does not have have a natural sharding notion as there is in HBase -- i.e. HBase Regions -- so hooking Cassandra to MapReduce is awkward. + The Cassandra fellas talk of their app being one ball of code only whereas with HBase there is HDFS, ZooKeeper and then HBase itself (Apparently it has less lines of code too). + Cassandra has an "extra" dimension in its data model called supercolumns (Serialize a List to a cell in HBase if your application requires this extra dimension). Less tangible differences -- or differences that can be addressed through application and development -- would include community, maturity, number and variety of production installs, and features (monitoring, shells, UIs, admin tools, etc.). On these latter dimensions, HBase would seem to do better but do the research and make your own call. Hope this helps, St.Ack On Tue, Sep 1, 2009 at 11:45 AM, charles du <[email protected]> wrote: > Hi: > > Does anyone have experience with both Cassandra and HBase? To me, they > target at a similar problem. I am wondering what are main differences > between these two, like reliablity/performance/features? > > Thanks. > > -- > tp >
