Re: hbase versus cassandra

Ryan Rawson Mon, 23 Nov 2009 15:57:03 -0800

HBase is about the same or slightly faster speed than Cassandra.
Cassandra does a write by sending "W" requests out.  HBase is 1 call,
and that overlays HDFS so there is calls out to HDFS to persist in a
log.  So the speeds should be about the same.  I can get 100-300k
writes/sec to a cluster (19 nodes).


Read speed is very high in HBase, since it doesn't have to conflict
resolve "R" number of replicas.  I can get per-node speed up to
300-400k rows/node sustained (on i7 based hardware).

Good luck out there, let us know if we can help!
-ryan

On Mon, Nov 23, 2009 at 2:09 PM, Adam Fisk <a...@littleshoot.org> wrote:
> Thanks guys - super helpful. My background is in p2p, but I adhere to
> Martin Fowler's "First Law of Distributed Object Design" wherever
> possible - Don’t distribute your objects! The timestamp trick for
> avoiding hotspots makes a lot of sense, and it's tough to argue with
> "hbase is faster," as I generally prefer faster.
>
> I'm surprised HBase is faster for writes given Cassandra's eventual
> consistency model. Can anyone explain why? Is it because HBase somehow
> knows where data has been replicated to, and just sends the queries to
> those nodes?
>
> It's extremely exciting both projects exist at all, and thanks for all
> your hard work. Depending on which route we go, I might be piping up
> on the list much more often.
>
> Thanks again.
>
> -Adam
>
>
> On Mon, Nov 23, 2009 at 12:09 PM, Ryan Rawson <ryano...@gmail.com> wrote:
>> Ah the classic.  Well since you're on the HBase list, my suggestion is
>> going to have to be "use HBase".  There are other advantages to HBase
>> over cassandra:
>>
>> - atomic row changes
>> - row locking
>> - increment value operation
>> - strong local consistency
>> - multiple versioning
>> - no possibility of corrupted data due to normal operations
>> - hbase is faster! read and write
>> - more flexible clustering strategy - you CAN grow a HBase cluster 2x,
>> 4x, 10x instantly.
>>
>> So it really isnt just "hadoop + caching".  There is much more here,
>> and there are some significant and difficult to describe downsides to
>> the Cassandra model.  If you peruse their mailing list you will see
>> phrases like "pick your tokens carefully" and "the order partitioner
>> doesnt evenly load all boxes" etc.  You have to manage your keyspace
>> very carefully with cassandra, whereas with hbase the major concern is
>> to not have a key hotspot (eg: always appending with timestamp).
>>
>> Another way to decide in the absence of information is to look at the
>> underlying models, bigtable vs dynamo.  Dynamo is used in the shopping
>> cart at Amazon and _nothing else_.  Bigtable is used by nearly every
>> Google product and drives Google App Engine. A recent presentation
>> said the largest Bigtable instance was 40 PB.  The dynamo paper said
>> there were scaling problems at a few hundred nodes (gossip breaks
>> down).
>>
>> I strongly believe that the bigtable model is more flexible, more
>> suitable for more purposes and generally more scalable than the dynamo
>> model.  The evidence is pale and stark.
>>
>> One last note, it seems that most Cassandra installations tend to use
>> it for really only 1 purpose and that is it.  Take Facebook, I have
>> not heard they have expanded the use of Cassandra beyond inbox search.
>> If you aren't growing, you're dying.
>>
>> -ryan
>>
>> On Mon, Nov 23, 2009 at 11:56 AM, Tim Robertson
>> <timrobertson...@gmail.com> wrote:
>>> Hi Adam,
>>>
>>> I am not the person to answer having not used Cassandra, but have
>>> spotted this being discussed on the list recently on a long thread:
>>>
>>> Search for "Cassandra vs HBase" on this page:
>>> http://mail-archives.apache.org/mod_mbox/hadoop-hbase-user/200909.mbox/thread
>>>
>>> There is also an article:
>>> http://www.roadtofailure.com/2009/10/29/hbase-vs-cassandra-nosql-battle/
>>>
>>> Hope this helps with your background reading.
>>>
>>> Cheers,
>>> Tim
>>>
>>>
>>>
>>>
>>>
>>> On Mon, Nov 23, 2009 at 8:34 PM, Adam Fisk <a...@littleshoot.org> wrote:
>>>> Hi Everyone- I'm implementing a new data layer and am struggling to
>>>> decide between HBase and Cassandra. The primary advantages of HBase as
>>>> far as I can tell are:
>>>>
>>>> 1) Tighter integration with Hadoop, making it easier to run M/R for
>>>> reporting and analytics
>>>> 2) Better caching layer
>>>>
>>>> Cassandra's thrift API seems a little more fleshed out to me, and
>>>> Facebook and Twitter give it a strong stamp of approval.
>>>>
>>>> Read performance is a major concern in our case. Can anyone lend a
>>>> hand in this debate? It seems difficult to me because there are likely
>>>> few people who have done significant implementations in both, but any
>>>> help is much appreciated.
>>>>
>>>> Thanks so much.
>>>>
>>>> -Adam
>>>>
>>>> --
>>>> Adam Fisk
>>>> http://www.littleshoot.org | http://adamfisk.wordpress.com |
>>>> http://twitter.com/adamfisk
>>>>
>>>
>>
>
>
>
> --
> Adam Fisk
> http://www.littleshoot.org | http://adamfisk.wordpress.com |
> http://twitter.com/adamfisk
>

Re: hbase versus cassandra

Reply via email to