Thanks Ari Rabkin!

1. I think the cost is very low, if the block's size is 10m, 1k/10m almost
0.01% of the disk space.

2. Actually, if two of racks lose and replication <= 3, it seem that we
can't recover all data. But in the situation of losing one rack of two racks
and replication >=2, we can recover all data.

3. Suppose we recover 87.5% of data. I am not sure whether or not the random
87.5% of the data is usefull for every user. But in the situation of the
size of most file is less than block'size, we can recover  so much data,.Any
recovered data may be  valuable for some user.

4. I guess most small companies or organizations just have a cluster with
10-100 nodes, and they can not afford a second HDFS cluster in a different
place or SAN. And it is a simple way to I think they would be pleased to
ensure data safety for they.

5. We can config to turn on when someone need it, or turn it off otherwise.

Glad to discuss with you!


2008/9/11 Ariel Rabkin <[EMAIL PROTECTED]>

> I don't understand this use case.
>
> Suppose that you lose half the nodes in the cluster.  On average,
> 12.5% of your blocks were exclusively stored on the half the cluster
> that's dead.  For many (most?) applications, a random 87.5% of the
> data isn't really useful.  Storing metadata in more places would let
> you turn a dead cluster into a corrupt cluster, but not into a working
> one.   If you need to survive major disasters, you want a second HDFS
> cluster in a different place.
>
> The thing that might be useful to you, if you're worried about
> simultaneous namenode and secondary NN failure, is to store the edit
> log and fsimage on a SAN, and get fault tolerance that way.
>
> --Ari
>
> On Tue, Sep 9, 2008 at 6:38 PM, 叶双明 <[EMAIL PROTECTED]> wrote:
> > Thanks for paying attention  to my tentative idea!
> >
> > What I thought isn't how to store the meradata, but the final (or last)
> way
> > to recover valuable data in the cluster when something worst (which
> destroy
> > the metadata in all multiple NameNode) happen. i.e. terrorist attack  or
> > natural disasters destroy half of cluster nodes within all NameNode, we
> can
> > recover as much data as possible by this mechanism, and hava big chance
> to
> > recover entire data of cluster because fo original replication.
> >
> > Any suggestion is appreciate!
> >
> > 2008/9/10 Pete Wyckoff <[EMAIL PROTECTED]>
> >
> >> +1 -
> >>
> >> from the perspective of the data nodes, dfs is just a block-level store
> and
> >> is thus much more robust and scalable.
> >>
> >>
> >>
> >> On 9/9/08 9:14 AM, "Owen O'Malley" <[EMAIL PROTECTED]> wrote:
> >>
> >> > This isn't a very stable direction. You really don't want multiple
> >> distinct
> >> > methods for storing the metadata, because discrepancies are very bad.
> >> High
> >> > Availability (HA) is a very important medium term goal for HDFS, but
> it
> >> will
> >> > likely be done using multiple NameNodes and ZooKeeper.
> >> >
> >> > -- Owen
> >>
>
> --
> Ari Rabkin [EMAIL PROTECTED]
> UC Berkeley Computer Science Department
>



-- 
Sorry for my english!!  明
Please help me to correct my english expression and error in syntax

Reply via email to