Thanks for the help. I am running a secondary name node.  I didn't initially
restore from the secondary because I was able to get things working from the
primary name node.  I had the (probably mistaken) impression that restoring
from the secondary name node was a last resort only to be used when the
primary name node couldn't be recovered.  Until HBase started failing, I
didn't even consider that HBase would have problems with me removing corrupt
files.  At that point, I thought it was probably too late to try the
secondary, since the secondary was probably reflecting my fsck changes on
the primary.  I guess I will try recovering from the secondary since it
sounds like I will otherwise lose the whole table anyway.  Alternately, is
there a way that I can manually initialize the corrupt table regions to
empty?

Thanks again!


On Thu, Dec 25, 2008 at 4:26 AM, Andrew Purtell <[email protected]> wrote:

> First, were you running a secondary data node? Did you
> follow the Hadoop instructions for recovering a fs image
> from the secondary? Is it too late for you to try it?
>
> In general, I think it may be useful for HBase to provide
> a recovery option where a corrupt table region can be
> reinitialized as empty. At least the whole table will not
> be lost. I have wanted something like this on occasion.
> This could be a new shell tool.
>
> One thing you can do is schedule daily maintenance time
> where you shut down your cluster and do a Hadoop distcp
> from the HBase/primary cluster to a secondary DFS cluster
> serving as backup media. This is akin to making a tape
> backup and has the same drawback of losing all edits
> subsequent to the last backup upon recovery, but on the
> other hand you do not lose everything. The distcp copies
> the data in reasonable parallel fashion so the backup
> can complete quickly even if the tables are large.
>
>   - Andy
>
> > From: g00dn3ss <[email protected]>
> > Subject: Recovering HBase after HDFS Corruption
> > To: [email protected]
> > Date: Wednesday, December 24, 2008, 10:40 PM
> > Hi All,
> >
> > We had a hardware failure on our namenode that led to
> > corruption in our DFS.  I ran an fsck and moved the
> > corrupted files to a lost+found directory.  The DFS
> > now seems to run fine by itself.  However, if I run
> > HBase following the fsck, I get a bunch of FileNotFound
> > exceptions as it tries to access some of the files
> > that were corrupted.  This ultimately seems to lead to
> > the HMaster getting in a bad state where it doesn't
> > respond.
> >
> > So I'm wondering if there is a way to recover from my
> > current state.
> [...]
>
>
>
>
>

Reply via email to