After chatting with some Facebook guys, we realized that one potential benefit from using HDFS is that the recovery from losing partial data in a node is more efficient. Suppose that one lost a single disk at a node. HDFS can quickly rebuild the blocks on the failed disk in parallel. This is a bit hard to do in cassandra, since we can't easily find the data on the failed disk from another node. So, when this happens, the whole node probably has to be taken out and bootstrapped. The same problem exists when a single sstable file is corrupted.
Jun IBM Almaden Research Center K55/B1, 650 Harry Road, San Jose, CA 95120-6099 [email protected] [email protected] wrote on 11/21/2009 03:50:22 PM: > [image removed] > > cassandra over hbase > > Adam Fisk > > to: > > cassandra-user > > 11/21/2009 03:51 PM > > Sent by: > > [email protected] > > Please respond to cassandra-user > > > I'm trying to navigate the rapidly shifting tides in NoSQL land, and > I'm particularly struggling with using Cassandra versus HBase. They > functionally seem quite similar to me even if the implementations are > quite different. > > What would people on the list say are the primary reasons to use > Cassandra over HBase? HA and speed are very important for my > application. HBase's tighter integration with Hadoop and therefore > easier reporting and analytics using M/R appeals to me, but I > intuitively prefer the Cassandra community and generally like the > architectural approach. HBase's Hadoop foundations also strike me as > both an advantage and a disadvantage, as it seems to tie their hands a > bit. > > Thanks for any advice you can give! > > -Adam > > -- > Adam Fisk > http://www.littleshoot.org | http://adamfisk.wordpress.com | > http://twitter.com/adamfisk
