2010/5/16 Tatsuya Kawano <tatsuya6...@gmail.com> > 2. On Hadoop trunk, I'd prefer not to hflush() every single put, but rely > on un-flushed replicas on HDFS nodes, so I can avoid the performace penalty. > Will this still durable? Will HMaster see un-flushed appends right after a > region server failure? > > If you don't call hflush(), you can still lose edits up to the last block boundary, since hflush is required to persist block locations to the namenode.
hflush() does *not* sync to disk - it just makes sure that the edits are in memory on all of the replicas. I have some patches staged for CDH3 that will also make the performance of this quite competitive by pipelining hflushes - basically it has little to no effect on throughput, but only a few ms penalty on each write. -Todd -- Todd Lipcon Software Engineer, Cloudera