Thanks Amandeep and Ryan,
I could make sure that unlike Cassandra, HBase does not do in-memory
replication. So, the paragraph below in Yahoo's report is partly incorrect:
Cassandra, sharded MySQL and PNUTS, all updates were
synched to disk before returning to the client. HBase does
not sync to disk, but relies on in-memory replication across
multiple servers for durability; this increases write throughput
and reduces latency, but can result in data loss on failure.
Maumau
----- Original Message -----
From: "Ryan Rawson" <ryano...@gmail.com>
To: <hbase-user@hadoop.apache.org>
Sent: Sunday, May 09, 2010 7:10 AM
Subject: Re: Does HBase do in-memory replication of rows?
For more architectural details of HBase, check out the bigtable paper,
it's fairly detailed, short and accessible.
On Sat, May 8, 2010 at 2:39 PM, Amandeep Khurana <ama...@gmail.com> wrote:
HBase does not do in-memory replication. Your data goes into a region,
which
has only one instance. Writes go to the write ahead log first, which is
written to the disk. However, since HDFS doesnt yet have a fully
performing
flush functionality, there is a chance of losing the chunk of data. The
next
release of HBase will guarantee data durability since by then the flush
functionality would be fully working.
Regarding replication - the difference between Cassandra and HBase is that
when you do a write in Cassandra, it doesnt return unless it has written
to
W nodes, which is configurable. In case of HBase, the replication is taken
care of by the filesystem (HDFS). When the region is flushed to the disk,
HDFS replicates the HFiles (in which the data for the regions is stored).
For more details of the working, read the Bigtable paper and
http://www.larsgeorge.com/2009/10/hbase-architecture-101-storage.html.