Hello,

I'm comparing HBase and Cassandra, which I think are the most promising distributed key-value stores, to determine which one to choose for the future OLTP and data analysis. I found the following benchmark report by Yahoo! Research which evalutes HBase, Cassandra, PNUTS, and sharded MySQL.

http://www.brianfrankcooper.net/pubs/ycsb-v4.pdf
http://www.brianfrankcooper.net/pubs/ycsb.pdf

The above report refers to HBase 0.20.3.
Reading this and HBase's documentation, two questions about load balancing and replication have risen. Could anyone give me any information to help solve these questions?

[Q2] replication
Does HBase perform in-memory replication of rows like Cassandra?
Does HBase sync updates to disk before returing success to clients?

According to the following paragraph in HBase design overview, HBase syncs writes.

----------------------------------------
Write Requests
When a write request is received, it is first written to a write-ahead log called a HLog. All write requests for every region the region server is serving are written to the same HLog. Once the request has been written to the HLog, the result of changes is stored in an in-memory cache called the Memcache. There is one Memcache for each Store.
----------------------------------------

The source code of Put class appear to show the above (though I don't understand the server-side code yet):

 private boolean writeToWAL = true;

However, Yahoo's report writes as follows. Is this incorrect? What is in-memory replication? I know HBase relies on HDFS to replicate data on the storage, but not in memory.

----------------------------------------
For Cassandra, sharded MySQL and PNUTS, all updates were
synched to disk before returning to the client. HBase does
not sync to disk, but relies on in-memory replication across
multiple servers for durability; this increases write throughput
and reduces latency, but can result in data loss on failure.
----------------------------------------

Maumau

Reply via email to