I am sure re-replication is not done on every heartbeat miss since that
would be very expensive and inefficient. At the same time you cannot really
tell if a node is partitioned away, crashed or just slow. Is it threshold
based i.e I missed N heartbeats so re-replicate ? Which package in the
source code could I look at to glean this information ?

Thanks
A

On 7/17/07, Phantom <[EMAIL PROTECTED]> wrote:

That's awesome.

Thanks
A

On 7/17/07, Doug Cutting <[EMAIL PROTECTED]> wrote:
>
> Phantom wrote:
> > Here is the scenario I was concerned about. Consider three nodes in
> the
> > system A, B and C which are placed say in different racks. Let us say
> that
> > the disk on A fries up today. Now the blocks that were stored on A are
> not
> > going to re-replicated (this is my understanding but I could be wrong
> in
> > this assumption) to some other node or to the new disk with which you
> would
> > bring back A.
>
> That's incorrect.  When a datanode fails to send a heartbeat to the
> namenode in a timely manner then its data is assumed missing and is
> re-replicated.  And when block corruption is detected, corrupt replicas
> are removed and non-corrupt replicas are re-replicated to maintain the
> desired level of replication.
>
> Doug
>


Reply via email to