The reason I ask is because I know in S3 and in P2P storage systems that I
have been involved in we had a replica synchronization algorithm that would
run once every night and it relied on techniques like Merkle tree
comparisons. Anyway understanding that would be beneficial. I don't mind
reading through the sources but would appreciate if pointed to the correct
package.

Thanks
A

On 7/17/07, Phantom <[EMAIL PROTECTED]> wrote:

I am sure re-replication is not done on every heartbeat miss since that
would be very expensive and inefficient. At the same time you cannot really
tell if a node is partitioned away, crashed or just slow. Is it threshold
based i.e I missed N heartbeats so re-replicate ? Which package in the
source code could I look at to glean this information ?

Thanks
A

On 7/17/07, Phantom <[EMAIL PROTECTED]> wrote:
>
> That's awesome.
>
> Thanks
> A
>
> On 7/17/07, Doug Cutting < [EMAIL PROTECTED]> wrote:
> >
> > Phantom wrote:
> > > Here is the scenario I was concerned about. Consider three nodes in
> > the
> > > system A, B and C which are placed say in different racks. Let us
> > say that
> > > the disk on A fries up today. Now the blocks that were stored on A
> > are not
> > > going to re-replicated (this is my understanding but I could be
> > wrong in
> > > this assumption) to some other node or to the new disk with which
> > you would
> > > bring back A.
> >
> > That's incorrect.  When a datanode fails to send a heartbeat to the
> > namenode in a timely manner then its data is assumed missing and is
> > re-replicated.  And when block corruption is detected, corrupt
> > replicas
> > are removed and non-corrupt replicas are re-replicated to maintain the
> >
> > desired level of replication.
> >
> > Doug
> >
>
>

Reply via email to