HLog durabilty on the current and future Hadoop releases

Tatsuya Kawano Sun, 16 May 2010 21:02:31 -0700

Hi,

A few days ago, I had a discussion with other Japanese developers onhadoop-jp Google group. It was about HLog durability on the recentHadoop releases (0.20.1, 0.20.2) I never looked at this issue closelyuntil then as I was certain to use Hadoop 0.21 from the beginning.

Someone showed us Todd's presentation at HUG March 2010, and we wereall agreed that in order to solve this issue, we will need to useHadoop trunk or Cloudera CDH3 including HDFS-200 and related patches.


Then I came up with a couple of questions:

1. On Hadoop 0.20.x (without HDFS-200 patch), I must close HLog tomake it's entries durable, right? While rolling HLog does this, howabout region server failure?

Someone in the discussion tried this senario. He killed (-9) a regionserver process after a few puts. The HLog was read by HMaster beforeit was closed. HMaster couldn't see any entry in the log and simplydeleted it. So his lost some puts.


Is this the expected behavior? He used Hadoop 0.20.1 and HBase 0.20.3.

2. On Hadoop trunk, I'd prefer not to hflush() every single put, butrely on un-flushed replicas on HDFS nodes, so I can avoid theperformace penalty. Will this still durable? Will HMaster see un-flushed appends right after a region server failure?


Thanks in advance,

--
河野 達也
Tatsuya Kawano (mr.)
Tokyo, Japan

HLog durabilty on the current and future Hadoop releases

Reply via email to