Performance.... It's all about performance. In my own tests, calling sync() in HDFS-0.21 on every single commit can limit the number of small rows you do to about a max of 1200 a second. One way to speed things up is to sync less often. Another way is to sync on a timer instead. Both of these are going to be way more important in HDFS-0.21/Hbase-0.21.
If we are talking about hdfs/hadoop 0.20, it hardly matters either way, there is that whole 'no append/sync' thing you know all about. -ryan On Mon, Jan 11, 2010 at 3:46 PM, Joydeep Sarma <jssa...@apache.org> wrote: > Hey HBase-devs, > > we have been going through hbase code to come up to speed. > > One of the questions was regarding the commit semantics. Thumbing through > the RegionServer code that's appending to the wal: > > syncWal -> HLog.sync -> addToSyncQueue ->syncDone.await() > > and the log writer thread calls: > > hflush(), syncDone.signalAll() > > however hflush doesn't necessarily call a sync on the underlying log file: > > if (this.forceSync || > this.unflushedEntries.get() >= this.flushlogentries) { ... sync() > ... } > > so it seems that if forceSync is not true, the syncWal can unblock before a > sync is called (and forcesync seems to be only true for metaregion()). > > are we missing something - or is there a bug here (the signalAll should be > conditional on hflush having actually flushed something). > > thanks, > > Joydeep >