Re: Recovering HBase after HDFS Corruption

Andrew Purtell Thu, 25 Dec 2008 01:26:34 -0800

First, were you running a secondary data node? Did you 
follow the Hadoop instructions for recovering a fs image
from the secondary? Is it too late for you to try it?


In general, I think it may be useful for HBase to provide
a recovery option where a corrupt table region can be
reinitialized as empty. At least the whole table will not
be lost. I have wanted something like this on occasion.
This could be a new shell tool. 

One thing you can do is schedule daily maintenance time 
where you shut down your cluster and do a Hadoop distcp
from the HBase/primary cluster to a secondary DFS cluster
serving as backup media. This is akin to making a tape
backup and has the same drawback of losing all edits
subsequent to the last backup upon recovery, but on the
other hand you do not lose everything. The distcp copies
the data in reasonable parallel fashion so the backup
can complete quickly even if the tables are large.

   - Andy

> From: g00dn3ss <[email protected]>
> Subject: Recovering HBase after HDFS Corruption
> To: [email protected]
> Date: Wednesday, December 24, 2008, 10:40 PM
> Hi All,
> 
> We had a hardware failure on our namenode that led to
> corruption in our DFS.  I ran an fsck and moved the
> corrupted files to a lost+found directory.  The DFS
> now seems to run fine by itself.  However, if I run
> HBase following the fsck, I get a bunch of FileNotFound
> exceptions as it tries to access some of the files
> that were corrupted.  This ultimately seems to lead to
> the HMaster getting in a bad state where it doesn't
> respond.
> 
> So I'm wondering if there is a way to recover from my
> current state.
[...]

Re: Recovering HBase after HDFS Corruption

Reply via email to