2010/5/16 Tatsuya Kawano <tatsuya6...@gmail.com>: > > Hi, > > A few days ago, I had a discussion with other Japanese developers on > hadoop-jp Google group. It was about HLog durability on the recent Hadoop > releases (0.20.1, 0.20.2) I never looked at this issue closely until then > as I was certain to use Hadoop 0.21 from the beginning. > > Someone showed us Todd's presentation at HUG March 2010, and we were all > agreed that in order to solve this issue, we will need to use Hadoop trunk > or Cloudera CDH3 including HDFS-200 and related patches. > > Then I came up with a couple of questions: > > 1. On Hadoop 0.20.x (without HDFS-200 patch), I must close HLog to make it's > entries durable, right? While rolling HLog does this, how about region > server failure?
The problem is that during failure how do you execute user code? If the JVM segfaults hard, we have no opportunity to execute Java code. > > Someone in the discussion tried this senario. He killed (-9) a region server > process after a few puts. The HLog was read by HMaster before it was closed. > HMaster couldn't see any entry in the log and simply deleted it. So his lost > some puts. Yes, this is expected. Perhaps a refresher is in order: http://en.wikipedia.org/wiki/SIGKILL "When sent to a program, SIGKILL causes it to terminate immediately. In contrast to SIGTERM and SIGINT, this signal cannot be caught or ignored, and the receiving process cannot perform any clean-up upon receiving this signal." As you noted correctly above, you can only replay from a Hadoop file _if_ the file was correctly closed. > > Is this the expected behavior? He used Hadoop 0.20.1 and HBase 0.20.3. > > 2. On Hadoop trunk, I'd prefer not to hflush() every single put, but rely on > un-flushed replicas on HDFS nodes, so I can avoid the performace penalty. > Will this still durable? Will HMaster see un-flushed appends right after a > region server failure? Could you explain how you think hflush() works and what "un-flushed replicas" means - because none of those concepts you are alluding to exist in HDFS-200 nor HDFS-265. > > Thanks in advance, > > -- > 河野 達也 > Tatsuya Kawano (mr.) > Tokyo, Japan > > > >