I think HDFS-200 does call fflush, so it will get to the OS buffers... -ryan
On Sat, May 8, 2010 at 8:28 PM, Todd Lipcon <t...@cloudera.com> wrote: > I think the point they were trying to make in the YCSB paper is that, > even with the WAL and hflush(), the data does not get synced to disk. > hflush() only ensures that the WAL data has made it to three > datanodes' OS caches, but doesn't actually guarantee anything is on > physical media. > I agree it's not clearly articulated, but that's what's happening. The > API that will cause fsync() on the DNs is called hsync() and has not > been written yet. > -Todd > On Sat, May 8, 2010 at 6:12 PM, Ryan Rawson <ryano...@gmail.com> wrote: >> >> Yes that section is very misleading. What is actually happening is like so: >> >> - Every time you write to HBase the data is written to a Write Ahead Log. >> - If there is a regionserver failure the log is replayed to recover the data >> - Due to a HDFS bug, the data in the most recent file, which is >> rotated at 64MB by default, is lost. >> >> The other good news is that serious effort is being undertaken to push >> a version of HDFS without this bug. Hopefully within a week people >> will be able to download a version of HDFS and not run into this >> situation. >> >> -ryan >> >> On Sat, May 8, 2010 at 6:10 PM, MauMau <maumau...@gmail.com> wrote: >> > Thanks Amandeep and Ryan, >> > >> > I could make sure that unlike Cassandra, HBase does not do in-memory >> > replication. So, the paragraph below in Yahoo's report is partly incorrect: >> > >> > Cassandra, sharded MySQL and PNUTS, all updates were >> > synched to disk before returning to the client. HBase does >> > not sync to disk, but relies on in-memory replication across >> > multiple servers for durability; this increases write throughput >> > and reduces latency, but can result in data loss on failure. >> > >> > Maumau >> > >> > >> > ----- Original Message ----- From: "Ryan Rawson" <ryano...@gmail.com> >> > To: <hbase-user@hadoop.apache.org> >> > Sent: Sunday, May 09, 2010 7:10 AM >> > Subject: Re: Does HBase do in-memory replication of rows? >> > >> > >> > For more architectural details of HBase, check out the bigtable paper, >> > it's fairly detailed, short and accessible. >> > >> > On Sat, May 8, 2010 at 2:39 PM, Amandeep Khurana <ama...@gmail.com> wrote: >> >> >> >> HBase does not do in-memory replication. Your data goes into a region, >> >> which >> >> has only one instance. Writes go to the write ahead log first, which is >> >> written to the disk. However, since HDFS doesnt yet have a fully >> >> performing >> >> flush functionality, there is a chance of losing the chunk of data. The >> >> next >> >> release of HBase will guarantee data durability since by then the flush >> >> functionality would be fully working. >> >> >> >> Regarding replication - the difference between Cassandra and HBase is that >> >> when you do a write in Cassandra, it doesnt return unless it has written >> >> to >> >> W nodes, which is configurable. In case of HBase, the replication is taken >> >> care of by the filesystem (HDFS). When the region is flushed to the disk, >> >> HDFS replicates the HFiles (in which the data for the regions is stored). >> >> For more details of the working, read the Bigtable paper and >> >> http://www.larsgeorge.com/2009/10/hbase-architecture-101-storage.html. >> > >> > > > > > -- > Todd Lipcon > Software Engineer, Cloudera >