How can we shut down the regieonservers cleanly if the master is down? Thanks -Yair
-----Original Message----- From: Jim Kellerman (POWERSET) [mailto:[EMAIL PROTECTED] Sent: Friday, November 07, 2008 8:54 AM To: [email protected] Subject: RE: Question on Availability Jean-Daniel is correct. Although HDFS has a "warm backup" for the namenode, it requires physical intervention to start it up if the primary namenode fails. With respect to HBase, moving the master is not a big deal. Shutting down the region servers (cleanly: no kill -9), pushing a new config which indicates where the new master will live and restarting the cluster should help get over a failed master. Eventually we will get rid of the master as a single point of failure when we integrate Zookeeper, which will allow us to run a multi-master configuration. --- Jim Kellerman, Powerset (Live Search, Microsoft Corporation) > -----Original Message----- > From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Jean- > Daniel Cryans > Sent: Friday, November 07, 2008 6:53 AM > To: [email protected] > Subject: Re: Question on Availability > > Michael, > > The Namenode is also a SPOF. > > On keeping a separate cluster for failover, until we add higher > availability, it depends on how much uptime you need to provide I guess. I > personally never saw a machine hosting a Master failing, so I'm not sure > on > how clean it can be regards the META, but I think that just closing your > region servers, changing the config for the master then restart the > cluster > with a new master would be nearly enough provided that the Namenode was > hosted on another machine. Maybe Jim or Stack can confirm? > > J-D > > On Fri, Nov 7, 2008 at 5:33 AM, Michael Dagaev > <[EMAIL PROTECTED]>wrote: > > > Hi, all > > > > I guess that Hbase master server is a single point of failure. Is > > it correct ? Does Hbase (I mean the whole stack -- HDFS + Hbase) have > > any other single point of failure ? > > > > If Hbase has a single point of failure we should arrange a backup > > Hbase cluster to switch to it in case of failure. Does it make sense ? > > > > Thank you for your cooperation, > > M. > >
