I don't understand this use case. Suppose that you lose half the nodes in the cluster. On average, 12.5% of your blocks were exclusively stored on the half the cluster that's dead. For many (most?) applications, a random 87.5% of the data isn't really useful. Storing metadata in more places would let you turn a dead cluster into a corrupt cluster, but not into a working one. If you need to survive major disasters, you want a second HDFS cluster in a different place.
The thing that might be useful to you, if you're worried about simultaneous namenode and secondary NN failure, is to store the edit log and fsimage on a SAN, and get fault tolerance that way. --Ari On Tue, Sep 9, 2008 at 6:38 PM, 叶双明 <[EMAIL PROTECTED]> wrote: > Thanks for paying attention to my tentative idea! > > What I thought isn't how to store the meradata, but the final (or last) way > to recover valuable data in the cluster when something worst (which destroy > the metadata in all multiple NameNode) happen. i.e. terrorist attack or > natural disasters destroy half of cluster nodes within all NameNode, we can > recover as much data as possible by this mechanism, and hava big chance to > recover entire data of cluster because fo original replication. > > Any suggestion is appreciate! > > 2008/9/10 Pete Wyckoff <[EMAIL PROTECTED]> > >> +1 - >> >> from the perspective of the data nodes, dfs is just a block-level store and >> is thus much more robust and scalable. >> >> >> >> On 9/9/08 9:14 AM, "Owen O'Malley" <[EMAIL PROTECTED]> wrote: >> >> > This isn't a very stable direction. You really don't want multiple >> distinct >> > methods for storing the metadata, because discrepancies are very bad. >> High >> > Availability (HA) is a very important medium term goal for HDFS, but it >> will >> > likely be done using multiple NameNodes and ZooKeeper. >> > >> > -- Owen >> -- Ari Rabkin [EMAIL PROTECTED] UC Berkeley Computer Science Department
