Many thanx to everyone who replied. I would prefer the flexibility of using either the filesystem/database/custom approach for storing the NameNode data.
Can someone please provide an insight into this. -Taj C G-4 wrote: > > We are rolling out a production grid with 32 compute nodes. The current > plan is to try to avoid catastrophic namenode failures by: > > 1. Running DRBD and mirror to another machine > > 2. Use namenode multiple volumes to replicate the name space image and > edit logs to yet another machine (see > http://wiki.apache.org/lucene-hadoop/FAQ#15). > > I'm also considering using filesystem snapshotting as well. The above > solutions presume that the mode of failure is hardware rather than > software. A regular snapshot would be useful is something bad happened > within the Hadoop framework itself and something scribbled all over the > namenode's data. > > As we get DRBD deployed and get to production I'll post more about our > experiences. > > HTH, > C G > > Erich Nachbar <[EMAIL PROTECTED]> wrote: > Did anyone try DRBD (http://www.drbd.org/) for mirroring the fsimage > and editlogs to another machine? > > Another idea which would involve code changes is to go to something > like Terracotta (http://www.terracottatech.com/) essentially allowing > multiple machines simultaneously to play the role of a namenode. I > only played around with their samples, but if it works as advertised > it could be a nice way to spread the load and achieve HA. > > Disclaimer: Not affiliated with DRDB or Terracotta. Just in need of an > (ideally automatic) failover solution to protect my weekends. > > On Nov 21, 2007, at 6:51 AM, j2eeiscool wrote: > >> >> Hi Dbruba, >> >> Thanx for your reply. >> >> On the first part (NameNode HA and failover), our experience with >> NFS has >> not been very good. >> >> Is having a Db as a backing store for NameNode an option (I >> understand that >> this may not be part of the current release 0.15.0 and would be a new >> feature)? >> >> -Taj >> >> >> Dhruba Borthakur wrote: >>> >>> Here is some info on recovering from a failed Namenode: >>> http://wiki.apache.org/lucene-hadoop/NameNodeFailover >>> >>> The fact that there is a single Namenode does mean that it could >>> possibly become the bottleneck when many thousands of clients/ >>> Datanodes >>> run on the cluster simultaneously. However, the design is such that >>> it >>> is scalable to a huge number of clients/Datanodes. Also, work is >>> going >>> on continuously to improve scalabilty. >>> >>> Thanks, >>> Dhruba >>> >>> -----Original Message----- >>> From: j2eeiscool [mailto:[EMAIL PROTECTED] >>> Sent: Tuesday, November 20, 2007 12:47 PM >>> To: [email protected] >>> Subject: NameNode HA >>> >>> >>> Hi, >>> >>> Based on the documentation I have read, there is one instance of a >>> NameNode. >>> >>> Are there recommended approaches on making the NameNode HA: >>> >>> 1.Have a backup which takes over. Data between primary and backup is >>> shared >>> thru shared files , DB etc. >>> >>> >>> Also does having a single NameNode limit the no. of concurrent HDFS >>> clients >>> ? I understand that HDFS Readers and Writers use the DataNode(s) >>> eventually, >>> but the initial access point is the NameNode. >>> >>> I would really appreciate help on these (I am evaluating HDFS for >>> use as >>> a >>> Concurrent, Reliable, Performant Distributed File System). >>> >>> Thanx, >>> Taj >>> >>> -- >>> View this message in context: >>> http://www.nabble.com/NameNode-HA-tf4846281.html#a13865411 >>> Sent from the Hadoop Users mailing list archive at Nabble.com. >>> >>> >>> >> >> -- >> View this message in context: >> http://www.nabble.com/NameNode-HA-tf4846281.html#a13878663 >> Sent from the Hadoop Users mailing list archive at Nabble.com. >> > > > > > --------------------------------- > Be a better pen pal. Text or chat with friends inside Yahoo! Mail. See > how. > -- View this message in context: http://www.nabble.com/NameNode-HA-tf4846281.html#a13958583 Sent from the Hadoop Users mailing list archive at Nabble.com.
