HBase is about the same or slightly faster speed than Cassandra. Cassandra does a write by sending "W" requests out. HBase is 1 call, and that overlays HDFS so there is calls out to HDFS to persist in a log. So the speeds should be about the same. I can get 100-300k writes/sec to a cluster (19 nodes).
Read speed is very high in HBase, since it doesn't have to conflict resolve "R" number of replicas. I can get per-node speed up to 300-400k rows/node sustained (on i7 based hardware). Good luck out there, let us know if we can help! -ryan On Mon, Nov 23, 2009 at 2:09 PM, Adam Fisk <a...@littleshoot.org> wrote: > Thanks guys - super helpful. My background is in p2p, but I adhere to > Martin Fowler's "First Law of Distributed Object Design" wherever > possible - Don’t distribute your objects! The timestamp trick for > avoiding hotspots makes a lot of sense, and it's tough to argue with > "hbase is faster," as I generally prefer faster. > > I'm surprised HBase is faster for writes given Cassandra's eventual > consistency model. Can anyone explain why? Is it because HBase somehow > knows where data has been replicated to, and just sends the queries to > those nodes? > > It's extremely exciting both projects exist at all, and thanks for all > your hard work. Depending on which route we go, I might be piping up > on the list much more often. > > Thanks again. > > -Adam > > > On Mon, Nov 23, 2009 at 12:09 PM, Ryan Rawson <ryano...@gmail.com> wrote: >> Ah the classic. Well since you're on the HBase list, my suggestion is >> going to have to be "use HBase". There are other advantages to HBase >> over cassandra: >> >> - atomic row changes >> - row locking >> - increment value operation >> - strong local consistency >> - multiple versioning >> - no possibility of corrupted data due to normal operations >> - hbase is faster! read and write >> - more flexible clustering strategy - you CAN grow a HBase cluster 2x, >> 4x, 10x instantly. >> >> So it really isnt just "hadoop + caching". There is much more here, >> and there are some significant and difficult to describe downsides to >> the Cassandra model. If you peruse their mailing list you will see >> phrases like "pick your tokens carefully" and "the order partitioner >> doesnt evenly load all boxes" etc. You have to manage your keyspace >> very carefully with cassandra, whereas with hbase the major concern is >> to not have a key hotspot (eg: always appending with timestamp). >> >> Another way to decide in the absence of information is to look at the >> underlying models, bigtable vs dynamo. Dynamo is used in the shopping >> cart at Amazon and _nothing else_. Bigtable is used by nearly every >> Google product and drives Google App Engine. A recent presentation >> said the largest Bigtable instance was 40 PB. The dynamo paper said >> there were scaling problems at a few hundred nodes (gossip breaks >> down). >> >> I strongly believe that the bigtable model is more flexible, more >> suitable for more purposes and generally more scalable than the dynamo >> model. The evidence is pale and stark. >> >> One last note, it seems that most Cassandra installations tend to use >> it for really only 1 purpose and that is it. Take Facebook, I have >> not heard they have expanded the use of Cassandra beyond inbox search. >> If you aren't growing, you're dying. >> >> -ryan >> >> On Mon, Nov 23, 2009 at 11:56 AM, Tim Robertson >> <timrobertson...@gmail.com> wrote: >>> Hi Adam, >>> >>> I am not the person to answer having not used Cassandra, but have >>> spotted this being discussed on the list recently on a long thread: >>> >>> Search for "Cassandra vs HBase" on this page: >>> http://mail-archives.apache.org/mod_mbox/hadoop-hbase-user/200909.mbox/thread >>> >>> There is also an article: >>> http://www.roadtofailure.com/2009/10/29/hbase-vs-cassandra-nosql-battle/ >>> >>> Hope this helps with your background reading. >>> >>> Cheers, >>> Tim >>> >>> >>> >>> >>> >>> On Mon, Nov 23, 2009 at 8:34 PM, Adam Fisk <a...@littleshoot.org> wrote: >>>> Hi Everyone- I'm implementing a new data layer and am struggling to >>>> decide between HBase and Cassandra. The primary advantages of HBase as >>>> far as I can tell are: >>>> >>>> 1) Tighter integration with Hadoop, making it easier to run M/R for >>>> reporting and analytics >>>> 2) Better caching layer >>>> >>>> Cassandra's thrift API seems a little more fleshed out to me, and >>>> Facebook and Twitter give it a strong stamp of approval. >>>> >>>> Read performance is a major concern in our case. Can anyone lend a >>>> hand in this debate? It seems difficult to me because there are likely >>>> few people who have done significant implementations in both, but any >>>> help is much appreciated. >>>> >>>> Thanks so much. >>>> >>>> -Adam >>>> >>>> -- >>>> Adam Fisk >>>> http://www.littleshoot.org | http://adamfisk.wordpress.com | >>>> http://twitter.com/adamfisk >>>> >>> >> > > > > -- > Adam Fisk > http://www.littleshoot.org | http://adamfisk.wordpress.com | > http://twitter.com/adamfisk >