You are absolutely correct that the HMaster is currently a single point of failure, just as the death of the name node in a HDFS cluster has been. Work has been done on HDFS to create the back up name node, and eliminating the HMaster as a SPOF will be a focus in the future (first we have to get it, the HRegionServer and the client to work).
The thing that makes a hot HMaster or HDFS back up name node difficult is the lack of a distributed lock manager (like Google's Chubby). A distributed lock manager project has been proposed on the Hadoop Wiki (see http://wiki.apache.org/lucene-hadoop/DistributedLockServer) for the project outline. To date, the focus has been getting HBase functional in a distributed environment at all (right now it runs only in a single process - see http://wiki.apache.org/lucene-hadoop/Hbase/HbaseArchitecture for the latest update on the HBase project), and noone has volunteered to take on the distributed lock manager project. If someone would like to step up and start driving the lock manager project, that would benefit both Hadoop's and HBase's failover capabilities. -Jim On Sun, 2007-04-29 at 20:02 -0700, Otis Gospodnetic wrote: > Hi, > > I've read http://wiki.apache.org/lucene-hadoop/Hbase/HbaseArchitecture and it > sounds mostly wonderful! However, I am wondering about this: "Since the > death of the HMaster means the death of the entire system, there's no reason > to store this information on disk.". Are there plans to change this, so that > HMaster is no longer a SPOF? > > Thanks, > Otis > . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . > Simpy -- http://www.simpy.com/ - Tag - Search - Share > > -- Jim Kellerman, Senior Engineer; Powerset [EMAIL PROTECTED]
