Re: HLog durabilty on the current and future Hadoop releases

Flavio Junqueira Mon, 17 May 2010 00:52:35 -0700

Hi, Given the topic of this message, I'd like to point out thatbookkeeper (HBASE-2315) provides a strong durability guarantee. Wesync writes to disk on a quorum of machines. I don't think thisfeature is currently on the roadmap of hbase, though.


Thanks,
-Flavio


On May 17, 2010, at 6:01 AM, Tatsuya Kawano wrote:


Hi,

A few days ago, I had a discussion with other Japanese developers on
hadoop-jp Google group. It was about HLog durability on the recent
Hadoop releases (0.20.1, 0.20.2)  I never looked at this issue closely
until then as I was certain to use Hadoop 0.21 from the beginning.

Someone showed us Todd's presentation at HUG March 2010, and we were
all agreed that in order to solve this issue, we will need to use
Hadoop trunk or Cloudera CDH3 including HDFS-200 and related patches.

Then I came up with a couple of questions:

1. On Hadoop 0.20.x (without HDFS-200 patch), I must close HLog to
make it's entries durable, right? While rolling HLog does this, how
about region server failure?

Someone in the discussion tried this senario. He killed (-9) a region
server process after a few puts. The HLog was read by HMaster before
it was closed. HMaster couldn't see any entry in the log and simply
deleted it. So his lost some puts.

Is this the expected behavior? He used Hadoop 0.20.1 and HBase 0.20.3.

2. On Hadoop trunk, I'd prefer not to hflush() every single put, but
rely on un-flushed replicas on HDFS nodes, so I can avoid the
performace penalty. Will this still durable? Will HMaster see un-
flushed appends right after a region server failure?

Thanks in advance,

--
河野 達也
Tatsuya Kawano (mr.)
Tokyo, Japan

Re: HLog durabilty on the current and future Hadoop releases

Reply via email to