RE: HBase recovery from failed -ROOT- / .META. server

Ramkrishna.S.Vasudevan Tue, 11 Sep 2012 21:47:30 -0700

Hi Willy

Yes I agree that META/ROOT recovery should happen as fast as possible.
Which version of HBase are you using?


Lot of fixes have gone into the latest versions regarding the recovery part.

You can take a look at HBASE-6713 also if you are using any of the latest
versions.

If you can post the logs it would be great so that we can identify the
scenario in which the recovery took time.  If it looks like a bug we can
file a JIRA and work on resolving it.  
In current HBase trunk lot of activites w.r.t MTTR (Mean time to Recover) is
happening.  
Inputs towards MTTR will always be taken with highest priority.

Thanks & Regards
Ram
> -----Original Message-----
> From: Willy Chang [mailto:willy.chang...@gmail.com]
> Sent: Tuesday, September 11, 2012 11:11 PM
> To: user@hbase.apache.org
> Subject: HBase recovery from failed -ROOT- / .META. server
> 
> It appears to take 30 minutes or so for HBase to recover from the
> failure
> of the regionserver holding the ROOT role. Please let me know what
> options
> are available to more quickly recover from such a situation, as when
> this
> happens our applications/SLAs are impacted.
> 
> It would also be good to be able to quickly recover from a failure of
> the
> regionserver which owns the .META. table. During HBase startup, a
> random
> server is elected to manage the ROOT and .META. tables (different
> servers).
> This creates a single point of failure. At the very least, perhaps we
> can
> find a way to force which server is selected for this role, perhaps
> even
> just via startup order. We could then assign a server which doesn't
> participate in flow tasks (no tasktracker), and so would be more
> stable.
> There may also be a config option for this. Wondering if there is a way
> to
> force election of a new ROOT/META owner within a minute or so instead
> of
> 30+ minutes.

RE: HBase recovery from failed -ROOT- / .META. server

Reply via email to