With ZK in 0.20, a cluster restart won't be necessary. Since the ROOT address will be stored in ZK, the clients will practically never communicate with the master and the region servers will just keep serving regions. If the master fails, the RS should not block gets/puts but won't be able to do splits. However, the new master will have to be started manually (or we can implement a simple way to have extra masters sleeping just in case) so that it gets its unique lock which will surely contain it's address.
BTW, this is all under devlopment but we (nitay and me) will follow the Bigtable way. J-D On Wed, Jan 7, 2009 at 2:22 PM, Andrew Purtell <[email protected]> wrote: > You can affect the current situation with a little development > effort, as we (privately) have at my employer. Comments inline > below. > > > From: Cosmin Lehene > > My answers inline... > > > > On 1/7/09 12:05 PM, "Genady" wrote: > > > Could somebody explain an expected HBase 0.18.1 nodes > > > behavior in case Hadoop cluster failover for a following > > > reasons: > > > > > > > > > > > > - HBase master region server fails > > > > You need to manually set a different machine as master and > > redistribute the configuration files on all other region > > servers and restart the cluster. > > > > Maybe someone in the development team could explain if this > > will change with Zookeeper integration. > > First, my understanding is that the ZK integration will handle > master role reassignment without requiring a cluster restart. > J-D could say more (or deny). > > What we (my employer) do currently is run the Hadoop and HBase > daemons as child processes of custom monitoring daemons that > use a private DHT which supports TTLs on cells to write > heartbeats into the cloud. This same mechanism also supports > service discovery. All HBase processes in particular can be > automatically restarted should the location of the master shift. > (The location of the master may shift if a node has a hard > failure.) We write Hadoop and HBase configuration files on the > fly as necessary. > > This all took me only a few days to implement and a few more > to debug. > > Relocation of the Hadoop name node is trickier. I believe it > is possible to have it write the fs image out to a NFS share > such that a service relocation to another host with the same > NFS mount will pick up the latest edits seamlessly. However I > do not trust NFS under a number of failure conditions so will > not try this myself. There might be other better strategies > for replication of the fs image. > > > > - HBase slave region server fails > > > > This is handled transparently. Regions served by the failed > > region server are reassigned to the rest of the region > > servers. > > > > > - Hadoop master server fails > > > > I suppose you mean the HDFS namenode. Currently, the > > namenode is a single point of failure in HDFS and needs > > manual intervention to configure a new namenode. A > > secondary namenode can be configured, however this one only > > keeps a metadata replica and does not act as failover node. > > http://wiki.apache.org/hadoop/NameNode > > See my comments above. > > > > - Hadoop slave server fails > > > > If a HDFS datanode fails it's files are already replicated > > on 2 other datanodes. Eventually the replication will be > > fixed by the namenode - creating third replica on one of > > remaining datanodes. > > You want to make sure your client is requesting the default > replication. The stock Hadoop config does allow DFS clients > to specify a replication factor of 1 only. However the HBase > DFS client will always request the default so this is not > an issue for HBase. > > > > - Hadoop master and HBase master are fail ( > > > in case they're installed on the same computer and for > > > instance disk has failover) > > > > These servers run independently so you can see above what > > happens. > > Don't run them on the same node regardless. The Hadoop name > node can become very busy given a lot of DFS file system > activity. Let it have its own dedicated node to avoid problems > e.g. replication stalls. > > > > - HBase slave region server is failed but > > > HBase data could be recovered and copied to other node > > > and the new node is added instead of failed one. > > > > Hbase region servers don't actually hold the data. Data > > is stored in HDFS. Region servers just serve regions and > > when the region server fails the regions are reassigned > > (see above). > > > > Cosmin > > - Andy > > > > >
