Andrew, I think you are confusing some components of the whole stack here. The Namenode is the master for HDFS just like the HMaster is the master for HBase. Hadoop is 2 things : HDFS and an implementation of MapReduce which also has a master, the JobTracker. HBase sits on all that.
So with regards with what's fixed, the HMaster SPOF is fixed for 0.20. The Namenode in 0.20 is still a SPOF. That means, if you want HA, you should get a really reliable machine for the Namenode but you can put the HMaster on any nodes you want. AFAIK, there is a BackupNamenode in Hadoop 0.21 that serves as a Namenode failover. J-D On Tue, Jun 2, 2009 at 10:49 AM, <[email protected]> wrote: > Occasionally, I think that I am getting all of this, but then a statement > like this appears: > > "To end on a sour note, HDFS Namenode is still a SPOF. When we're done with > HBase 0.20 it should be the only SPOF." > > So now I am confused all over again. I thought that any namenode SPOF that > was fixed in Hadoop would also imply that it was fixed in HDFS. Doesn't HDFS > use Hadoop in some form to M/R the reads/writes? If that is not the case and > HDFS is going to suffer from a namenode SPOF in the near-term, are there > plans in the works to remedy that too? > > -----Original Message----- > From: ext Ryan Rawson [mailto:[email protected]] > Sent: 01 June, 2009 16:57 > To: [email protected] > Subject: Re: State of HA > > Hey, > > Stack is saying that for HADOOP-4379, it fails 1/5th of the time - recovery > takes more than 15 minutes, aka potentially unlimited amount of time. That > patch relies on lease recovery it seems, so it may not be the final answer > for us. > > Now, on the subject of the rest of things, under Zookeeper we are doing a > much better job at HA. Regionserver crashes are detect significantly faster > than the 2 minute lease timeout, with my fixes you can take down any > regionserver without getting 'stuck' with an unassigned ROOT/META > (previously a problem). > > I have noticed on trunk I can kill and restart the master w/o taking down > the cluster. During master start-up it does a fairly good job at detecting > node status and otherwise recovering. I can't say about master elections > exactly yet. > > The HA story is shaping up nicely. > > To end on a sour note, HDFS Namenode is still a SPOF. When we're done with > HBase 0.20 it should be the only SPOF. > > -ryan > > On Mon, Jun 1, 2009 at 1:50 PM, <[email protected]> wrote: > >> I am trying to parse this: are you implying that I can expect a 20% ("1 out >> of 5 or so") success getting HA to work with this code? >> >> -----Original Message----- >> From: [email protected] [mailto:[email protected]] On Behalf Of ext >> stack >> Sent: 01 June, 2009 13:27 >> To: [email protected] >> Subject: Re: State of HA >> >> You can pull TRUNK and try it with HADOOP-4379. >> >> The master failover works as J-D suggests. It needs some polish but thats >> on its way. The HADOOP-4379 will get you a sync that works most of the >> time >> (1 out of 5 or so in my testing) but hopefully that'll be addressed soon >> too. You'll also need HBASE-1470. Its the bit of code that exploits >> HADOOP-4379 when configuration is set right). >> >> If you need help setting up stuff, you know where to find us. Issues we >> want to hear about because we're hoping to tell the above as part of our >> 0.20.0 release story. >> >> Yours, >> St.Ack >> >> On Mon, Jun 1, 2009 at 7:59 AM, <[email protected]> wrote: >> >> > Hello, >> > >> > I have been looking at Jira and trying to get a current snapshot of the >> > state of HA for HBase/Hadoop? I know that the zookeeper integration is >> the >> > core of the HA story, but when is that slated for a "stable" debut? Is >> there >> > anything that is currently in svn that we can pull and test? >> > >> > TIA, >> > >> > Andrew >> > >> > >> >
